xtabond2 model specification

Hannah Wu

Join Date: Jul 2019
Posts: 12

xtabond2 model specification

17 Jul 2019, 12:43

The AR stats keeps being significant until I raised the lagged dependent variable to the 4th lagged term. So I ran the model below

Code:

xtabond2 proactivity l(1/4)(proactivity) generalcrime feb mar apr may jun jul aug sep oct nov dec, gmm( proactivity generalcrime, lag(5 8)) iv( feb mar apr may jun jul aug sep oct nov dec, equation(level)) twostep robust artests(5)

I tried l.proactivity, l(1/2)(proactivity),l(1/3)(proactivity), this is the only one that has insignificant AR stats beyond AR(1). The model output looks okay below. But it appears very sensitive to model specification. If I add collapse, which isn't supposed to change the results, results change. Any thoughts on why it changes, or other ways to specify the model?

Code:

Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step est
> imation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =     25104
Time variable : week                            Number of groups   =       523
Number of instruments = 470                     Obs per group: min =        48
Wald chi2(16) =  14145.86                                      avg =     48.00
Prob > chi2   =     0.000                                      max =        48
------------------------------------------------------------------------------
             |              Corrected
 proactivity |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 proactivity |
         L1. |   .4472726   .0460867     9.71   0.000     .3569444    .5376008
         L2. |   .2588113   .0712742     3.63   0.000     .1191166    .3985061
         L3. |   .1298111   .0540169     2.40   0.016     .0239399    .2356823
         L4. |   .1308565   .0247562     5.29   0.000     .0823352    .1793778
             |
generalcrime |   .0577832   .0373932     1.55   0.122    -.0155061    .1310726
         feb |  -.5542773   .1725027    -3.21   0.001    -.8923764   -.2161782
         mar |  -.7677247   .1638914    -4.68   0.000    -1.088946   -.4465035
         apr |  -.6148198   .1516915    -4.05   0.000    -.9121296     -.31751
         may |  -.6942948   .1564604    -4.44   0.000    -1.000952   -.3876381
         jun |  -.7967815    .156679    -5.09   0.000    -1.103867   -.4896964
         jul |   -.696932   .1484477    -4.69   0.000    -.9878841   -.4059799
         aug |  -.8511846   .1570522    -5.42   0.000    -1.159001    -.543368
         sep |  -.6681694   .1469222    -4.55   0.000    -.9561316   -.3802071
         oct |  -.6614347   .1601904    -4.13   0.000    -.9754021   -.3474672
         nov |  -.8454832   .1686919    -5.01   0.000    -1.176113   -.5148532
         dec |   -.536543   .1588977    -3.38   0.001    -.8479767   -.2251092
       _cons |   .6442327   .2119702     3.04   0.002     .2287787    1.059687
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(5/8).(proactivity generalcrime)
Instruments for levels equation
  Standard
    feb mar apr may jun jul aug sep oct nov dec
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL4.(proactivity generalcrime)
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -6.31  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   0.71  Pr > z =  0.477
Arellano-Bond test for AR(3) in first differences: z =  -0.81  Pr > z =  0.416
Arellano-Bond test for AR(4) in first differences: z =   0.96  Pr > z =  0.335
Arellano-Bond test for AR(5) in first differences: z =  -0.97  Pr > z =  0.334
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(453)  =1776.52  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(453)  = 473.71  Prob > chi2 =  0.242
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(359)  = 398.55  Prob > chi2 =  0.074
    Difference (null H = exogenous): chi2(94)   =  75.16  Prob > chi2 =  0.923
  iv(feb mar apr may jun jul aug sep oct nov dec, eq(level))
    Hansen test excluding group:     chi2(442)  = 463.10  Prob > chi2 =  0.235
    Difference (null H = exogenous): chi2(11)   =  10.62  Prob > chi2 =  0.476

Code:

. xtabond2 proactivity l(1/4)(proactivity) generalcrime feb mar apr may jun jul aug
> sep oct nov dec, gmm( proactivity generalcrime, lag(5 8) collapse) iv( feb mar apr
>  may jun jul aug sep oct nov dec, equation(level)) twostep robust artests(5)
Favoring space over speed. To switch, type or click on mata: mata set matafavor spee
> d, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =     25104
Time variable : week                            Number of groups   =       523
Number of instruments = 22                      Obs per group: min =        48
Wald chi2(16) =    133.97                                      avg =     48.00
Prob > chi2   =     0.000                                      max =        48
------------------------------------------------------------------------------
             |              Corrected
 proactivity |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
 proactivity |
         L1. |   .0661376   .4625995     0.14   0.886    -.8405409     .972816
         L2. |   1.067306   .2863359     3.73   0.000     .5060984    1.628514
         L3. |   .3243474   .4095168     0.79   0.428    -.4782908    1.126986
         L4. |  -.0056103   .0693738    -0.08   0.936    -.1415804    .1303599
             |
generalcrime |  -.2164601   .2403091    -0.90   0.368    -.6874574    .2545372
         feb |  -.1236117   .9449195    -0.13   0.896     -1.97562    1.728396
         mar |  -.5567039   .8190546    -0.68   0.497    -2.162021    1.048613
         apr |  -.3190208   .8371222    -0.38   0.703     -1.95975    1.321709
         may |  -.2762736   .7620428    -0.36   0.717     -1.76985    1.217303
         jun |    -.41769   .7011003    -0.60   0.551    -1.791821    .9564412
         jul |   -.212813   .7252157    -0.29   0.769     -1.63421    1.208584
         aug |   -.437525   .7145739    -0.61   0.540    -1.838064    .9630141
         sep |   -.309892   .8271875    -0.37   0.708     -1.93115    1.311366
         oct |  -.3654707   .8021538    -0.46   0.649    -1.937663    1.206722
         nov |  -.7637196   .7350004    -1.04   0.299    -2.204294    .6768547
         dec |  -.2051388   .8173252    -0.25   0.802    -1.807067    1.396789
       _cons |  -.8427515   1.483277    -0.57   0.570     -3.74992    2.064417
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(5/8).(proactivity generalcrime) collapsed
Instruments for levels equation
  Standard
    feb mar apr may jun jul aug sep oct nov dec
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL4.(proactivity generalcrime) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -0.25  Pr > z =  0.800
Arellano-Bond test for AR(2) in first differences: z =  -2.32  Pr > z =  0.020
Arellano-Bond test for AR(3) in first differences: z =   0.22  Pr > z =  0.823
Arellano-Bond test for AR(4) in first differences: z =   1.03  Pr > z =  0.305
Arellano-Bond test for AR(5) in first differences: z =   1.22  Pr > z =  0.224
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(5)    =  29.50  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(5)    =  10.78  Prob > chi2 =  0.056
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(3)    =   4.08  Prob > chi2 =  0.253
    Difference (null H = exogenous): chi2(2)    =   6.70  Prob > chi2 =  0.035

Tags: None

Hannah Wu

Join Date: Jul 2019

Posts: 12
#2

17 Jul 2019, 12:44

Can you please help me? @Sebastian Kripfganz
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2604
#3

18 Jul 2019, 04:00

These GMM estimators for dynamic panel data models are designed for situations with a small time dimension. In your case, T=48 is already quite large. As a consequence, you are using a huge number of instruments (470) that can lead to a "too-many-instruments problem". Collapsing would be one way to address this problem. Starting only with the 5th lag as an instrument is problematic because these deep lags might be weak instruments. When there is no remaining serial error correlation, the second lag already qualifies as a valid instrument.

To deal with the autocorrelation, besides adding lags of the dependent variable you can also add lags of the independent variable(s). In your case, the model is probably also suffering from omitted variables.

In any case, given the large sample you have, I would recommend to resort to different estimation strategies. The dynamic panel data bias should be reasonably small given your time series dimension such that you could even use the classical fixed-effects estimator with xtreg, or xtivreg if you still want to instrument the independent variable by its own lags.

https://www.kripfganz.de/stata/
Comment
Hannah Wu

Join Date: Jul 2019

Posts: 12
#4

30 Mar 2020, 12:17

Originally posted by Sebastian Kripfganz View Post

These GMM estimators for dynamic panel data models are designed for situations with a small time dimension. In your case, T=48 is already quite large. As a consequence, you are using a huge number of instruments (470) that can lead to a "too-many-instruments problem". Collapsing would be one way to address this problem. Starting only with the 5th lag as an instrument is problematic because these deep lags might be weak instruments. When there is no remaining serial error correlation, the second lag already qualifies as a valid instrument.

To deal with the autocorrelation, besides adding lags of the dependent variable you can also add lags of the independent variable(s). In your case, the model is probably also suffering from omitted variables.

In any case, given the large sample you have, I would recommend to resort to different estimation strategies. The dynamic panel data bias should be reasonably small given your time series dimension such that you could even use the classical fixed-effects estimator with xtreg, or xtivreg if you still want to instrument the independent variable by its own lags.

Thanks @Sebastian Kripfganz!

I apologize for the late response as I am getting used to using the site.

One follow up question though, from the Hansen's test results, it does not appear that "too many instruments" caused an issue here since they are above 0.1 and below 0.25? I tried adding "collapse", but the lagged terms of the DV became insignificant. You mentioned in other posts that this might indicate a collinearity issue among the lags. But I also couldn't remove them since the AR tests are significant? I know it doesn't show in the above results, but if I remove any lag of the DV from the model, the AR test becomes significant at least for one of the AR term.

I also tried adding lags of the independent variable to remove the significant autocorrelation, but it does not rid the autocorrelation. In order to remove the autocorrelation, I had to include at least 4 lags of the DV, and subsequently use lag terms starting from the 5th to instrument. Is there a way to test whether the instruments are weak?

Again, thanks for your help!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2604
#5

31 Mar 2020, 09:49

Originally posted by Hannah Wu View Post

I know it doesn't show in the above results, but if I remove any lag of the DV from the model, the AR test becomes significant at least for one of the AR term.

I also tried adding lags of the independent variable to remove the significant autocorrelation, but it does not rid the autocorrelation. In order to remove the autocorrelation, I had to include at least 4 lags of the DV, and subsequently use lag terms starting from the 5th to instrument. Is there a way to test whether the instruments are weak?

If adding lags of independent variables does not help, then you might just keep the extra lags of the dependent variable even if they are insignificant. This should not bite much given your large time series. If there is no serial correlation anymore, you can then still use instruments from the 2nd lag onwards.

A simple way of checking for weak instruments would be to look at unconditional correlations between the regressors and the corresponding instruments with the correlate command. The community-contributed ivreg2 command provides further weak-instrument statistics; see slides 39 to 42 of my 2019 London Stata conference presentation on how to use the xtdpdgmm command to replicate dynamic panel data GMM results with ivreg2.
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

XTDPDGMM: new Stata command for GMM estimation of linear (dynamic) panel data models

https://www.kripfganz.de/stata/
Comment
Hannah Wu

Join Date: Jul 2019

Posts: 12
#6

31 Mar 2020, 11:17

Originally posted by Sebastian Kripfganz View Post

If adding lags of independent variables does not help, then you might just keep the extra lags of the dependent variable even if they are insignificant. This should not bite much given your large time series. If there is no serial correlation anymore, you can then still use instruments from the 2nd lag onwards.

A simple way of checking for weak instruments would be to look at unconditional correlations between the regressors and the corresponding instruments with the correlate command. The community-contributed ivreg2 command provides further weak-instrument statistics; see slides 39 to 42 of my 2019 London Stata conference presentation on how to use the xtdpdgmm command to replicate dynamic panel data GMM results with ivreg2.
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

XTDPDGMM: new Stata command for GMM estimation of linear (dynamic) panel data models

Thank you so much Sebastian Kripfganz! This is so helpful.

For the IV- say, if I also added two lags of the IV, can I still use the second lag onwards to instrument both the IV and DV? so something like gmm(iv dv, lag(2 .))?

The other question is regarding the interpretation of the lagged IV. My time unit is week here. If IV(t-1) is significant, can I interpret it as the impact of IV from last week on current DV? How does using the 2nd onwards as instruments affect the interpretation?

I really appreciate your help!

Hannah
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2604
#7

01 Apr 2020, 06:27

Originally posted by Hannah Wu View Post

For the IV- say, if I also added two lags of the IV, can I still use the second lag onwards to instrument both the IV and DV? so something like gmm(iv dv, lag(2 .))?

Yes.

Originally posted by Hannah Wu View Post

The other question is regarding the interpretation of the lagged IV. My time unit is week here. If IV(t-1) is significant, can I interpret it as the impact of IV from last week on current DV? How does using the 2nd onwards as instruments affect the interpretation?

All the effects are partial effects, i.e. holding everything else constant. The coefficient of IV(t-1) would be the effect of a one-unit change in past week's IV on current DV assuming that everything else in the model remains unchanged. In dynamic models, this interpretation may not be very meaningful because IV(t-1) also has an effect on DV(t-1) and DV(t-1) in turn has an effect on current DV. What you maybe want to compute are so-called long-run effects (the sum of the coefficients of all current and lagged IV divided by 1 minus the sum of the coefficients of all lags of DV).

https://www.kripfganz.de/stata/
Comment
Hannah Wu

Join Date: Jul 2019

Posts: 12
#8

06 Apr 2020, 17:51

Originally posted by Sebastian Kripfganz View Post

Yes.

All the effects are partial effects, i.e. holding everything else constant. The coefficient of IV(t-1) would be the effect of a one-unit change in past week's IV on current DV assuming that everything else in the model remains unchanged. In dynamic models, this interpretation may not be very meaningful because IV(t-1) also has an effect on DV(t-1) and DV(t-1) in turn has an effect on current DV. What you maybe want to compute are so-called long-run effects (the sum of the coefficients of all current and lagged IV divided by 1 minus the sum of the coefficients of all lags of DV).

I see. Thank you Sebastian Kripfganz!

One more question regarding the coefficient of IV(t). When none of the lagged IV is significant, does a significant IV(t) indicate an immediate yet short-term effect or an immediate and permanent effect? I got myself confused.. Based on the long-run effect calculation, IV(t) should indicate a permanent change (even though it's a contemporaneous term) if none of the lags is significant; and if the lags are significant, that means it takes longer to reach a long-run effect?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2604
#9

07 Apr 2020, 07:02

The coefficient of IV(t) indicates an immediate short-term effect. The long-run effect in response to a permanent change of IV will be a function of this short-run effect (the accumulation of short-run effects over time).

https://www.kripfganz.de/stata/
Comment

Announcement

xtabond2 model specification

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment