statistical tests and choice of options with xtabond2

Raphael Cardot-Martin

Join Date: Aug 2018
Posts: 5

statistical tests and choice of options with xtabond2

07 Aug 2018, 03:55

Hi,

I am currently investigating the impact of macroprudential policies on credit growth, with Stata 13.0. The idea is to reproduce the model used by Cerutti and al (2015) : ''The Use and Effectiveness of Macroprudential Policies : New Evidence'', but with another set of data.

In their paper, they used the '' xtdpd '' command in order to obtain the Arellano-Bond GMM estimator. They used a dynamic panel data because past credit growth affect the actual ones.
So I tried this command, but because GMM is very new to me, I tried Xtabond2 command by Roodman (2007), which is less complex. In order to replicate the model used by Cerutti and al (2015), I choose the difference GMM estimator, and considered that all variables are endogeneous.

My understanding is that :

- The number of instruments has to be lower than the number of groups (of observations). In fact, I have N = 21 and T = 56 (14 years, 2000 to 2014, with a quarterly frequency) and several missing values. I know that xtabond2 is not very recommanded when there is a small N and a big T, but with the ''collapse'' option and only lags(2 3) for each instruments, I obtain a number of intruments equals to 18, so maybe it should work. In comparison, Cerutti and al (2015) have N=119 and T = 13.

- Because all my variables are endogeneous, i can't use the first lag of a variable as an instrument.

- Twostep method is preferred over the onestep method.

In fact, there is 5 macroprudential measures in the model, which are coded as follows : if a measure is tightened (loosened), the associed value is ''1'' (''-1''). If there is no change, the value is 0. If a measure is tightened (loosened) twice over the years, then the value will be ''2'' (''-2'') after the second tightening (loosening). These are the variables with the "c_"
In addition to macroprudential variables, the other independent variables are the lagged dependent variable, the real GDP growh in the previous quarter, a dummy variable which capture the presence of a banking crisis during the previous quarter, and a variable which capture the impact of the interest rate in the previous quarter.
So all independent variables have a lag.

This is the model: CreditGrowth_i,t= CreditGrowth_i,t-1α + C_macroprudentialmeasures_i,t-1 β + GDPGrowth_i,t-1 + BankingCrisis_i,t-1δ + InterestRate_i,t-1θ + CountryFixedEffect + e_i

And this is an example of what I tried:

first attempt

Code:

xtabond2 credit_growth L1.credit_growth L1.c_sscb L1.c_cap_req L1.c_ltv_cap L1.c_rr L1.c_exposition
>  L1.interest_rate L1.GDP_growth L1.banking_crisis, gmmstyle(L1.credit_growth L1.c_sscb L1.c_cap_req
>  L1.c_ltv_cap L1.c_rr L1.c_exposition L1.interest_rate L1.GDP_growth L1.banking_crisis, lag(2 3)
> collapse) noleveleq twostep
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, two-step difference GMM
------------------------------------------------------------------------------
Group variable: num Number of obs = 1007
Time variable : q_date Number of groups = 21
Number of instruments = 18 Obs per group: min = 37
Wald chi2(9) = 74.81 avg = 47.95
Prob &amp;gt; chi2 = 0.000 max = 58
-----------------------------------------------------------------------------------------
credit_growth | Coef. Std. Err. z P&amp;gt;|z| [95% Conf. Interval]
------------------------+----------------------------------------------------------------
credit_growth |
L1. | -.0390548 .1130852 -0.35 0.730 -.2606977 .1825881
|
c_sscb |
L1. | -4.62242 .9516904 -4.86 0.000 -6.487699 -2.757141
|
c_cap_req |
L1. | -1.407323 1.02413 -1.37 0.169 -3.41458 .5999351
|
c_ltv_cap |
L1. | 4.428849 3.137135 1.41 0.158 -1.719824 10.57752
|
c_rr |
L1. | -1.116012 1.679616 -0.66 0.506 -4.407999 2.175975
|
c_exposition |
L1. | .0300822 1.116648 0.03 0.979 -2.158508 2.218673
|
interest_rate |
L1. | .0787794 .2102005 0.37 0.708 -.333206 .4907648
|
GDP_growth |
L1. | -.2289244 .2287425 -1.00 0.317 -.6772514 .2194026
|
banking_crisis |
L1. | 1.426424 1.563614 0.91 0.362 -1.638203 4.491051
-----------------------------------------------------------------------------------------
Warning: Uncorrected two-step standard errors are unreliable.

Instruments for first differences equation
GMM-type (missing=0, separate instruments for each period unless collapsed)
L(2/3).(L.credit_growth L.interest_rate L.c_sscb L.c_cap_req
L.c_ltv_cap L.c_rr L.c_exposition L.GDP_growth
L.banking_crisis) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -2.34 Pr &amp;gt; z = 0.019
Arellano-Bond test for AR(2) in first differences: z = 0.29 Pr &amp;gt; z = 0.775
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(9) = 26.54 Prob &amp;gt; chi2 = 0.002
(Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(9) = 11.35 Prob &amp;gt; chi2 = 0.253
(Robust, but weakened by many instruments.)


.
end of do-file

And this is the post-estimation results (Postestimation → Manage estimation results → Table of estimation results):

Code:

Variable     active      
    
credit_gro~h
L1.  -.03905478    
            
c_sscb
L1.  -4.6224202***  
            
c_cap_req
L1.  -1.4073225    
            
c_ltv_cap
L1.    4.428849    
            
c_rr
L1.  -1.1160123    
            
c_exposition
L1.   .03008222    
            
interest_r~e
L1.   .07877938    
            
GDP_growth
L1.   -.2289244    
            
banking_cr~s
L1.   1.4264237    
    
legend: * p&lt;.1; ** p&lt;.05; ***    p&lt;.01

If I'm not mistaken, the model passes both the AR test and the Hansen test (more appropriate than the Sargan test because of the "twostep" option) but for the last one I'm not sure because of this :''(Robust, but weakened by many instruments.)''
The problem is that there is some of these variables, like crisis_banking, which are supposed to have a huge negative impact on the credit growth.

second attempt

With the addition of the "robust" option, no variables are significant.

Code:

Variable     active        
        
credit_gro~h
L1.  -.03905478        
            
c_sscb
L1.  -4.6224202        
            
c_cap_req
L1.  -1.4073225        
            
c_ltv_cap
L1.    4.428849        
            
c_rr
L1.  -1.1160123        
            
c_exposition
L1.   .03008222        
            
interest_r~e
L1.   .07877938        
            
GDP_growth
L1.   -.2289244        
            
banking_cr~s
L1.   1.4264237        
        
legend: * p&lt;.1; ** p&lt;.05;    ***    p&lt;.01

Because these macroprudential measures are very new, I have a lot of ''0'' in my data, and so Stata dropped many of them. So I had to agregate several variables together in order to solve this issue, and this is why I have only 5 macroprudential variables.
In addition, one issue is that some coefficients have the wrong sign, and their coefficients are volatile (when I delete one variable which is not significant, some coefficients change completly, both in terms of sign and impact (see below)).

third attempt:

Here is an attempt which is similar to the first attempt, except that I deleted one non-significant variable, banking_crisis:

Code:

xtabond2 credit_growth L1.credit_growth L1.c_sscb L1.c_cap_req L1.c_ltv_cap L1.c_rr L1.c_exposition
> L1.interest_rate L1.GDP_growth, gmmstyle(L1.credit_growth L1.c_sscb L1.c_cap_req L1.c_ltv_cap L1.c_rr
>  L1.c_exposition L1.interest_rate L1.GDP_growth, lag(2 3) collapse) noleveleq twostep
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, two-step difference GMM
------------------------------------------------------------------------------
Group variable: num                             Number of obs      =      1180
Time variable : q_date                          Number of groups   =        21
Number of instruments = 16                      Obs per group: min =        41
Wald chi2(8)  =      8.45                                      avg =     56.19
Prob &gt; chi2   =     0.391                                      max =        58
-------------------------------------------------------------------------------
credit_growth |      Coef.   Std. Err.      z    P&gt;|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
credit_growth |
          L1. |  -.2786686   .2411136    -1.16   0.248    -.7512425    .1939053
              |
       c_sscb |
          L1. |  -2.977535   3.035509    -0.98   0.327    -8.927022    2.971952
              |
    c_cap_req |
          L1. |  -.2339573   1.101806    -0.21   0.832    -2.393458    1.925544
              |
    c_ltv_cap |
          L1. |   5.151175   7.673965     0.67   0.502     -9.88952    20.19187
              |
         c_rr |
          L1. |  -1.594947   2.977368    -0.54   0.592    -7.430481    4.240588
              |
 c_exposition |
          L1. |   -.824738   3.071016    -0.27   0.788    -6.843818    5.194342
              |
interest_rate |
          L1. |   .1985294   .3771818     0.53   0.599    -.5407333    .9377921
              |
   GDP_growth |
          L1. |  -.0554081   .3740414    -0.15   0.882    -.7885158    .6776997
-------------------------------------------------------------------------------
Warning: Uncorrected two-step standard errors are unreliable.

Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/3).(L.credit_growth L.c_sscb L.c_cap_req L.c_ltv_cap L.c_rr
    L.c_exposition L.interest_rate L.GDP_growth) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -0.77  Pr &gt; z =  0.440
Arellano-Bond test for AR(2) in first differences: z =  -1.13  Pr &gt; z =  0.257
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(8)    =  18.29  Prob &gt; chi2 =  0.019
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(8)    =  10.57  Prob &gt; chi2 =  0.227
  (Robust, but weakened by many instruments.)


.
end of do-file

. estimates table, star(.1 .05 .01) style(oneline)

------------------------------
    Variable |    active      
-------------+----------------
credit_gro~h |
         L1. | -.27866862    
             |
      c_sscb |
         L1. |  -2.977535    
             |
   c_cap_req |
         L1. | -.23395735    
             |
   c_ltv_cap |
         L1. |   5.151175    
             |
        c_rr |
         L1. | -1.5949469    
             |
c_exposition |
         L1. | -.82473802    
             |
interest_r~e |
         L1. |  .19852941    
             |
  GDP_growth |
         L1. | -.05540809    
------------------------------
legend: * p&lt;.1; ** p&lt;.05; *** p&lt;.01

I read that this issue may come from multicollinearity. So I did a multicollinearity diagnostic, but because the VIF is not available after xtabond2, I used ''regress'' instead of ''xtabond2'', and here are the results:

Code:

regress credit_growth L1.c_sscb L1.c_cap_req L1.c_ltv_cap L1.c_rr L1.c_exposition L1.interest_rate
>  L1.GDP_growth L1.banking_crisis
estat vif

Variable                 VIF    1/VIF    
        
banking_crisis    
L1.                     1.25    0.803007
c_cap_req    
L1.                     1.24    0.807731
c_exposition    
L1.                     1.20    0.831029
c_sscb    
L1.                     1.18    0.849848
interest_rate    
L1.                     1.14    0.879999
c_rr    
L1.                     1.14    0.881049
GDP_growth    
L1.                     1.11    0.903970
c_ltv_cap    
L1.                     1.02    0.976797
        
Mean VIF    1.16

1) Is this the good way to perform statistics tests when we use GMM ? I have the same remark for the heteroskedasticity test.

2) Apparently, there is no multicollinearity (the VIF is very low for each variable), what I'm supposed to do in order to obtain strong coefficients ?

3) In case several regression with different lags successfully pass the Sargan/Hansen test and the AR test, what critererion is used to choose the best ? The one with the higher number of instruments ?

4) with fixed effects models, some variables are highly significant, as when I use xtabond2 without the ''robust '' option. When I add this option, no variables are significant. My understanding is that the''robust'' option allows us to work with heteroskedasticity. In this particular case, heteroskedasticity is an assumption or I have to test it ? (as for my first question, I can't test it after xtabond2).

5) Finally, I'am used to go to Statistics → Postestimation → Manage estimation results → Table of estimation results, in order to obtain the significance of coefficients with stars. Is it possible to do that directly in the regression results ?

I'm a little lost with the choice of options, especially with the option ''orthogonal'', I don't know if I can use it.

Here is a sample of my data:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
           credit_growth c_sscb c_cap_req c_ltv_cap c_rr c_exposition   interest_rate  GDP_growth  banking_crisis
"2000q1"          0 0 0 0 -1 1   3.5423    .93956 0
"2000q2"   1.322812 0 0 0 -1 1    4.263  1.013441 0
"2000q3"   .6646443 0 0 0 -1 1   4.7376  -.141768 0
"2000q4"  1.7269744 0 0 0 -1 1 5.024167   .076452 0
"2001q1"   .4631569 0 0 0 -1 1 4.745033  1.634062 0
"2001q2"    .224378 0 0 0 -1 1 4.590766   .085889 0
"2001q3"  1.0374751 0 0 0 -1 1 4.267833  -.289637 0
"2001q4"  1.0920715 0 0 0 -1 1   3.4435   .118342 0
"2002q1"  .24551354 0 0 0 -1 1 3.362233  -.308947 0
"2002q2"   .8305011 0 0 0 -1 1    3.446   .258706 0
"2002q3"  1.6563503 0 0 0 -1 1 3.357333   .473059 0
"2002q4"    .241526 0 0 0 -1 1   3.1088  -.214017 0
"2003q1"  1.4132304 0 0 0 -1 1   2.6831 -1.211775 0
"2003q2"  1.1803162 0 0 0 -1 1   2.3619   .021715 0
"2003q3" -.19664878 0 0 0 -1 1 2.139233   .499232 0
"2003q4"   .3575432 0 0 0 -1 1 2.149633   .367172 0
"2004q1"   .7381726 0 0 0 -1 1 2.062967  -.024217 0
"2004q2"  -.3454666 0 0 0 -1 1 2.082467   .344379 0
"2004q3"   .4714422 0 0 0 -1 1   2.1163  -.182331 0
"2004q4"  .35406515 0 0 0 -1 1   2.1636   .096709 0
end
label var GDP_growth "taux_de_croissance_PIB_reel"

So, it's a very long post with a lot of questions, and I think I forgot some, so the title is not very accurate. Sorry for that, but I did a lot of research, especially on this forum.
Thank you in advance for your answers.

References :

Cerutti, E., S. Claessens and L. Laeven. 2015. “The Use and Effectiveness of Macroprudential Policies: New Evidence.” IMF Working Paper WP/15/61.
Cerutti, E., R. Correa, E. Fiorentino, and E. Segalla. 2017. “Changes in Prudential Policy Instruments—A New Cross-Country Database.” International Journal of Central Banking 13 (S1).
Roodman, D. (2009) : How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal 9(1): 86-136
Roodman, D. (2009), “A note on the theme of too many instruments,” Oxford Bulletin of Economics and Statistics, 71, 135-158
Mileva, E. (2007) "Using Arellano – Bond Dynamic Panel GMM Estimators in Stata, Tutorial with Examples using Stata 9.0 (xtabond and xtabond2)," Economics Department, Fordham University July 9, 2007

Tags: dynamic panel data

Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#2

07 Aug 2018, 04:26

Your "small N, large T" setup is not just a problem when it comes to instrument proliferation (which you solve by using the collapse option). More importantly, the consistency of the estimator and the distributional properties of the test statistics are obtained under "large N, small T" asymptotics. In your case, you are running into several problems that cannot really be solved within the GMM context given the dimensions of your data set:
The twostep estimator essentially clusters the moment conditions on the group level. With your relatively small numbers of clusters, the estimates of the optimal weighting matrix become very imprecise and essentially unreliable.

The same argument applies to robust standard errors.

The Hansen test relies on an optimal weighting matrix. Hence, the same concerns apply again.

The Sargan test relies on the absence of serial correlation in the idiosyncratic error term and homoskedasticity. This is a strong assumption but the only assumption you can work with in the context of your data set. Also, it is not valid in the context of the system GMM estimator, in case you were planning to extend the analysis in that direction.

To sum up, you should not use the twostep and the robust options. You should use the collapse and the noleveleq options (as you have done).

https://www.kripfganz.de/stata/
Comment

Raphael Cardot-Martin

Join Date: Aug 2018
Posts: 5

08 Aug 2018, 04:19

Thanks you for your answer Sebastian.

First, I do not plan to use the system GMM estimator.

Here are the new results:

Code:

 xtabond2 credit_growth L1.credit_growth L1.c_sscb L1.c_cap_req L1.c_ltv_cap L1.c_rr L1.c_exposition
> L1.interest_rate L1.GDP_growth L1.banking_crisis, gmmstyle(L1.credit_growth L1.c_sscb L1.c_cap_req
> L1.c_ltv_cap L1.c_rr L1.c_exposition L1.interest_rate L1.GDP_growth L1.banking_crisis,
> lag(2 3) collapse) noleveleq
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, one-step difference GMM
------------------------------------------------------------------------------
Group variable: num                             Number of obs      =      1007
Time variable : q_date                          Number of groups   =        21
Number of instruments = 18                      Obs per group: min =        37
Wald chi2(9)  =      9.56                                      avg =     47.95
Prob > chi2   =     0.387                                      max =        58
--------------------------------------------------------------------------------
 credit_growth |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
 credit_growth |
           L1. |   .0777834    .229739     0.34   0.735    -.3724967    .5280635
               |
        c_sscb |
           L1. |   -3.67845   1.668837    -2.20   0.028     -6.94931   -.4075894
               |
     c_cap_req |
           L1. |  -1.253732   1.231352    -1.02   0.309    -3.667138    1.159673
               |
     c_ltv_cap |
           L1. |   2.848392   2.720483     1.05   0.295    -2.483657    8.180442
               |
          c_rr |
           L1. |   .5770481   1.645909     0.35   0.726    -2.648874     3.80297
               |
  c_exposition |
           L1. |    .193002   1.398424     0.14   0.890    -2.547859    2.933863
               |
 interest_rate |
           L1. |  -.1184319   .2067701    -0.57   0.567    -.5236939    .2868301
               |
    GDP_growth |
           L1. |  -.1579743   .4710919    -0.34   0.737    -1.081297    .7653487
               |
banking_crisis |
           L1. |     1.5616   1.431815     1.09   0.275    -1.244706    4.367906
--------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/3).(L.credit_growth L.c_sscb L.c_cap_req L.c_ltv_cap L.c_rr
    L.c_exposition L.interest_rate L.GDP_growth L.banking_crisis) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -2.05  Pr > z =  0.040
Arellano-Bond test for AR(2) in first differences: z =   0.90  Pr > z =  0.371
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(9)    =  26.54  Prob > chi2 =  0.002
  (Not robust, but not weakened by many instruments.)


. 
end of do-file

. estimates table, star(.1 .05 .01) style(oneline)

------------------------------
    Variable |    active      
-------------+----------------
credit_gro~h |
         L1. |  .07778339     
             |
      c_sscb |
         L1. | -3.6784495**   
             |
   c_cap_req |
         L1. | -1.2537323     
             |
   c_ltv_cap |
         L1. |  2.8483922     
             |
        c_rr |
         L1. |   .5770481     
             |
c_exposition |
         L1. |  .19300195     
             |
interest_r~e |
         L1. | -.11843188     
             |
  GDP_growth |
         L1. | -.15797434     
             |
banking_cr~s |
         L1. |     1.5616     
------------------------------
legend: * p<.1; ** p<.05; *** p<.01

.
According to the Sargan test, the model is not valid. In addition to that, it is weird that only one variable has an impact on the dependent variable. Like I said previously, L.credit_growth and L.banking_crisis are supposed to have a big influence. Maybe something is wrong the data, I don't know what to do. Should I give up the GMM estimator? Should I reduce the "T" using an annual frequency instead?

If I understand correctly, I can't do any test. So, the way I obtain the VIF is not valid?
Finally, is it possible to have the stars (ie, the significance of coefficients) directly instead of being forced to use the postestimation button please?

Thanks,
Raphaël

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2581
#4

08 Aug 2018, 07:49

Reducing T certainly does not help. The small N is the limiting factor.

You might indeed have to rethink your whole model. Starting with the assumption that all variables are endogenous is quite challenging already. Is there any meaningful interpretation of the coefficients in such a model? I am not familiar with the particular applied literature, so cannot comment further in that regard.

I cannot say anything about the VIF, sorry.

https://www.kripfganz.de/stata/
Comment
Raphael Cardot-Martin

Join Date: Aug 2018

Posts: 5
#5

09 Aug 2018, 11:57

Thank you again for your answer.

In fact, N refers to the European countries, but because I have difficulty finding all the data (it's quite challenging to search for quarterly data), I have only N=21. At best, N = 27 (Cyprus excluded).
I guess it doesn't change anything much, but annual data are easier to find.

I started to think about which are the variables are endogenous, or exogenous, and then I read this in Cerutti et al (2015): "The estimates are determined using Arellano-Bond GMM treating the instrument and the control variables of credit growth, GDPgrowth, the crisis dummy, and the policy rate as endogeneous."
In addition to that, macroprudential measures are used to avoid excessive or insufficient credit growth.
When the credit growth is excessive, these measures are tightened (+1 in the database), and they are loosened (-1 in the database) when the credit growth is insufficient.
So we expect:
a positive impact of L.credit_growth, because there is some persistance in credit developments.

a negative impact of the macroprudential measures;

a positive impact of L.real_GDP, because there is more credit when the econonomics conditions are good

a negative impact of L.banking_crisis, for the same reasons as before

a negative impact of L.interest_rate, because the higher the interest rate, the more expensive the credit becomes.
Comment
Raphael Cardot-Martin

Join Date: Aug 2018

Posts: 5
#6

11 Aug 2018, 01:01

I'm still stuck, and I don't think a fixed-effect would be a good alternative.
Comment
Raphael Cardot-Martin

Join Date: Aug 2018

Posts: 5
#7

16 Aug 2018, 01:43

Hello everyone,

I know that bumping is unpopular here, but I really don't know how I can improve my questions. Sorry for that.

So I bump this thread one last time.
Comment

Announcement

statistical tests and choice of options with xtabond2

Comment

Comment

Comment

Comment

Comment

Comment