xtabond2 - is it ok to have significant ar(2) and insignificant ar(3) if the gmmstyle lags are limited to third order (3 .)?

Gal Rakover

Join Date: Oct 2021
Posts: 18

xtabond2 - is it ok to have significant ar(2) and insignificant ar(3) if the gmmstyle lags are limited to third order (3 .)?

20 Jul 2022, 03:20

Hello friends,

I have read Roodman (2009) and some posts here regarding the question, but I am not sure if I understood correctly.

If, when performing system GMM using the xtabond2 command, I restrict the gmmstyle lags to 3-6 order lags, does that mean that an insignificant AB autocorrelation test of order 3 suffice. I.e. is it okay if AR(2) rejects the null but AR(3) does not?

Thanks,
Gal

This is the xtabond2 command line

Code:

xtabond2 per_ma3_TFP $finC $nonfinC $convar yr* , gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_inflation per_logGDP_pc, coll  laglimits(3 6)) iv(yr* per_ma3_trade_GDP  per_ma3_yrs_sch per_ma3_IRCG  per_ma3_trade_GDP) twostep small r artests(3)

And this is the output:

Code:

.                 xtabond2 per_ma3_TFP $finC $nonfinC $convar yr* , gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_infl
> ation per_logGDP_pc, coll  laglimits(3 6)) iv(yr* per_ma3_trade_GDP  per_ma3_yrs_sch per_ma3_IRCG  per_ma3_trade_GDP) two
> step small r artests(3)
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
yr2013 dropped due to collinearity
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: ID                              Number of obs      =       328
Time variable : year                            Number of groups   =        58
Number of instruments = 47                      Obs per group: min =         2
F(18, 57)     =   2153.89                                      avg =      5.66
Prob > F      =     0.000                                      max =         9
----------------------------------------------------------------------------------------
                       |              Corrected
           per_ma3_TFP |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
   per_c_ma3_fin_stock |  -.0000648   .0007612    -0.09   0.932    -.0015892    .0014595
    per_c_ma3_fin_flow |   .0003164   .0042521     0.07   0.941    -.0081983    .0088311
per_c_ma3_nonfin_stock |   .0009372   .0007619     1.23   0.224    -.0005885    .0024628
 per_c_ma3_nonfin_flow |   .0029903   .0023584     1.27   0.210    -.0017323    .0077129
       per_ma3_yrs_sch |  -.0090489   .0106553    -0.85   0.399    -.0303858     .012288
          per_ma3_IRCG |  -.0007388    .003128    -0.24   0.814    -.0070026     .005525
 per_ma3_cpi_inflation |  -.0008791     .00289    -0.30   0.762    -.0066662     .004908
   per_ma3_gov_exp_GDP |  -.0045448   .0064397    -0.71   0.483    -.0174401    .0083505
     per_ma3_trade_GDP |   -.000299   .0002061    -1.45   0.152    -.0007117    .0001136
         per_logGDP_pc |   .0408159   .0410099     1.00   0.324    -.0413049    .1229368
                yr1992 |  -.0753124   .0527374    -1.43   0.159    -.1809172    .0302924
                yr1995 |  -.0561759   .0322474    -1.74   0.087    -.1207503    .0083985
                yr1998 |   -.045674   .0403163    -1.13   0.262    -.1264061     .035058
                yr2001 |  -.0322637   .0388948    -0.83   0.410    -.1101493    .0456218
                yr2004 |   .0057429   .0476757     0.12   0.905    -.0897261    .1012119
                yr2007 |   .0118822   .0280837     0.42   0.674    -.0443545    .0681189
                yr2010 |  -.0082215   .0137538    -0.60   0.552     -.035763    .0193199
                yr2016 |   .0163729   .0068277     2.40   0.020     .0027005    .0300452
                 _cons |   .7881233   .1980452     3.98   0.000     .3915447    1.184702
----------------------------------------------------------------------------------------
Instruments for first differences equation
  Standard
    D.(yr1992 yr1995 yr1998 yr2001 yr2004 yr2007 yr2010 yr2013 yr2016
    per_ma3_trade_GDP per_ma3_yrs_sch per_ma3_IRCG per_ma3_trade_GDP)
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(3/6).(per_c_ma3_fin_stock per_c_ma3_fin_flow per_c_ma3_nonfin_stock
    per_c_ma3_nonfin_flow per_ma3_gov_exp_GDP per_ma3_cpi_inflation
    per_logGDP_pc) collapsed
Instruments for levels equation
  Standard
    yr1992 yr1995 yr1998 yr2001 yr2004 yr2007 yr2010 yr2013 yr2016
    per_ma3_trade_GDP per_ma3_yrs_sch per_ma3_IRCG per_ma3_trade_GDP
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL2.(per_c_ma3_fin_stock per_c_ma3_fin_flow per_c_ma3_nonfin_stock
    per_c_ma3_nonfin_flow per_ma3_gov_exp_GDP per_ma3_cpi_inflation
    per_logGDP_pc) collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =   2.56  Pr > z =  0.010
Arellano-Bond test for AR(2) in first differences: z =   2.17  Pr > z =  0.030
Arellano-Bond test for AR(3) in first differences: z =   0.90  Pr > z =  0.368
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(28)   = 112.22  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(28)   =  27.90  Prob > chi2 =  0.470
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(21)   =  18.78  Prob > chi2 =  0.599
    Difference (null H = exogenous): chi2(7)    =   9.11  Prob > chi2 =  0.245

Tags: None

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#2

20 Jul 2022, 07:00

Taken at face value, yes; if you treat the variables in your gmm() option as endogenous and start only with lag 3, then a significant AR(2) test would not pose a problem.

As an aside, all the variables specified in your iv() option must be assumed to be uncorrelated with the unobserved group-specific effects. This is essentially a random-effects assumption, which often is hard to justify.
Furthermore, notice that iv(varlist) without the eq() suboption is not equivalent to the combination iv(varlist, eq(diff)) iv(varlist, eq(level)). If this surprises you, then you probably want the second specification and should explicitly specify the eq() suboptions.

The following presentation might be helpful as well:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
1 like
Comment
Gal Rakover

Join Date: Oct 2021

Posts: 18
#3

20 Jul 2022, 07:14

Thanks so much for the helpful comments.

I will apply the equation option. Regarding the random effect issue, I want to see that I understand correctly. Given that I cannot justify such assumption (I'll have to think of that), should I include them in the gmm as predetermined variables? which also means that unless I restrict them to lag 3 as well, the ar(2) result is important. Correct? Or should I just drop them?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#4

20 Jul 2022, 07:28

I assume those variables are varying over time. There are different ways how you can treat them without dropping them (which I would not recommend). If you continue to assume that they are strictly exogenous with regard to the idiosyncratic error component, you could use a gmm() option starting with lag 0 or you could use the iv() option with eq(diff) only. The AR() results are irrelevant for strictly exogenous variables.

If you instead want to treat them as predetermined, you would normally use the gmm() option starting with lag 1. If the AR(2) test is significant, but not the AR(3) test, you would need to start at lag 2.

https://www.kripfganz.de/stata/
1 like
Comment
Gal Rakover

Join Date: Oct 2021

Posts: 18
#5

20 Jul 2022, 08:34

And by using iv( varlist , e(d)) it should resemble a fixed affect assumption?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#6

20 Jul 2022, 09:05

Kind of, yes, because the group-specific effects drop out in the first-differenced model.

If you want to mimic the actual fixed-effects estimator (as in xtreg, fe) for those variables by using instruments in deviations from their within-group means, you can use my xtdpdgmm command:

Code:

xtdpdgmm per_ma3_TFP $finC $nonfinC $convar, teffects gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_inflation per_logGDP_pc, coll lag(3 6) m(diff)) gmm($finC $nonfinC per_ma3_gov_exp_GDP per_ma3_cpi_inflation per_logGDP_pc, coll lag(2 2) diff m(level)) iv(per_ma3_trade_GDP per_ma3_yrs_sch per_ma3_IRCG per_ma3_trade_GDP, m(mdev)) twostep small vce(robust)

The two gmm() options of xtdpdgmm replicate the gmm() option of xtabond2.

https://www.kripfganz.de/stata/
1 like
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#7

24 Apr 2024, 12:27

Hello Professor Sebastian, I have a similar issue in my GMM estimation, where AR(2) is significant whereas subsequent AR(3 and more) is insignificant. So if I understand your above suggestion correctly, this can be handled by starting lag for lag-dependent and endogenous variables from 3 and predetermined from lag 2. And can we directly show all three AR (1,2 and 3) in the research paper . Further, i have tried to find in the literature if there is any paper that has reported in the same way but I couldn't find any.

Thank you
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#8

29 Apr 2024, 20:16

Hello @Sebastian Kripfganz, I have a similar issue in my GMM estimation, where AR(2) is significant whereas subsequent AR(3) and more are insignificant. So if I understand your above suggestion correctly, this can be handled by instrumenting the lag-dependent variables and endogenous variables from 3rd lag and predetermined variables from 2nd lag. In several places, you suggest using the higher-order lag of the dependent variable as a regressor.

My questions are as follows:

1). How do you justify the use of higher order in a research paper? I haven't come across any paper where, in DPD settings, a second or more lag is used as a regressor. Can we say something like, " To address the 2nd and higher order serial correlation in the error term, the 2nd lag of the dependent variable is introduced"?

2) In my case, the use of second lag makes the AR(2) insignificant, but my coefficient on second lag is also insignificant. Is it still okay to use second lag as a regressor?

3) Further, can we directly show all three AR (1, 2, and 3) in the research paper?

Thank you
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#9

30 Apr 2024, 02:59

Serial correlation in the error term is often a sign of misspecified dynamics. Adding a second lag of a dependent variable or adding lags of the independent variables as regressors aims to obtain a dynamically complete model, where all the dynamic effects are captured by the right-hand side variables.(Note: This is different from using higher-order lags as instruments. If there is evidence of second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. However, this does not address the potential misspecification of the model dynamics and could lead to weak-instruments problems if those higher-order lags are insufficiently correlated with the regressors.)

Even though the second lag is statistically insignificant, it could sometimes still help to account for unexplained serial correlation. Note: Statistical insignificance does not imply that the coefficient is equal to zero; there is just not enough statistical evidence to rule this out.

There is nothing wrong with showing the results from several serial correlation tests if it is helpful for your argumentation.

https://www.kripfganz.de/stata/
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#10

30 Apr 2024, 10:02

Thank you so much, @Sebastian Kripfganz sir, for your invaluable help. I have a follow-up question. During my literature search, I encountered a paper where the author justified the inclusion of the second lag of the dependent variable as a regressor to address its higher persistence. Additionally, his results supported this with a statistically significant second lag coefficient.
In my research paper, what justification could I provide for incorporating a second lag? I noticed in your presentation at the London Stata Conference (slide 90) that you mentioned: “higher-order lags of the dependent variable, yi,t−2, yi,t−3, and the other regressors, xi,t−1, xi,t−2, might have predictive power and could help to prevent serial correlation of the error term uit when included as regressors”. Would it be appropriate for me to cite this?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#11

30 Apr 2024, 12:47

Sure, you can use this argument to justify the inclusion of a second lag.

https://www.kripfganz.de/stata/
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#12

30 Apr 2024, 19:57

Great, thanks a lot!
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#13

01 May 2024, 02:51

Hello @Sebastian Kripfganz sir, based on your above suggestion I have used the following model to estimate system GMM with an orthogonal option.

1. Could you please help me decide if it is appropriate or not?
xtabond2 INT L.INT L2.INT ROA LNTA LnTASq DivNIITI MQOETA HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA , gmm(INT, lag (1 3) eq (level)) gmm(INT, lag (2 4) collapse eq (diff)) gmm (ROA DivNIITI MQOETA LNTA LnTASq, lag(2 4) collapse ) iv(HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA, eq(level)) twostep robust small orthogonal artests(3)

Where :

INT is a dependent variable (predetermined)
ROA LNTA LnTASq DivNIITI MQOETA are firm-specific variables considered endogenous.
HHITAFinal Inflation GDP are macroeconomic variables considered exogenous.
GC is a dummy variable for two years (8th and 9th year where it takes the value 1 otherwise 0)
CO is a one-year dummy (21st year of data)
PV is a dummy for Private sector firm
PB is a dummy for the Public sector firm

By the way my data is unbalanced panel data of 23 years.

2. Sir what is the logic of starting one lag before in level and FOD equation compared to the difference equation for lag dependent variable, predetermined and endogenous variable?

Let's say that the level form of the GMM equation is Y_it = Y_i,t-1 + X_it + U_it. Then how the Lag(1) of Y as an instrument for the level equation will look like? Will it be (Y_it - Y_i,t-1) or (Y_i,t-1 - Y_i,t-2)?

3. Is it because in the level equation the error term (U_it) is in level form, and the first lag of Y_it is in difference (Y_it - Y_i,t-1)? And since (Y) is a predetermined variable none of the expressions in the (Y_it - Y_i,t-1) will correlate with U_it?

4. Furthermore, is it okay to have the same lag length for the difference/FOD equation and the level equation? Or is there a criterion to decide this?

Last edited by Minhaj uddin; 01 May 2024, 02:57.
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#14

01 May 2024, 04:00

Hello @Sebastian Kripfganz sir, based on your above suggestion, I have used the following model to estimate system GMM with an orthogonal option.

1.) Could you please help me decide if it is appropriate or not?

xtabond2 INT L.INT L2.INT ROA LNTA LnTASq DivNIITI MQOETA HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA , gmm(INT, lag (1 3) eq (level)) gmm(INT, lag (2 4) collapse eq (diff)) gmm (ROA DivNIITI MQOETA LNTA LnTASq, lag(2 4) collapse ) iv(HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA, eq(level)) twostep robust small orthogonal artests(3)

Where :

INT is a dependent variable (predetermined)
ROA LNTA LnTASq DivNIITI MQOETA are firm-specific variables considered endogenous.
HHITAFinal Inflation GDP are macroeconomic variables considered exogenous.
GC is a dummy variable for two years (8th and 9th year, where it takes the value 1; and 0 otherwise)
CO is a one-year dummy (21st year of data)
PV is a dummy for Private sector firm
PB is a dummy for the Public sector firm

By the way my data is unbalanced panel data of 23 years.

2.) Sir what is the logic of starting one lag before in level and FOD equation compared to the difference equation for lag dependent variable, predetermined and endogenous variable?

Let's say that the level form of the GMM equation is Y_it = Y_i,t-1 + X_it + U_it. Then how the Lag(1) of Y as an instrument for the level equation will look like? Will it be (Y_it - Y_i,t-1) or (Y_i,t-1 - Y_i,t-2)?

3. Is it because in the level equation the error term (U_it) is in level form, and the first lag of Y_it is in difference (Y_it - Y_i,t-1)? And since (Y) is a predetermined variable none of the expressions in (Y_it-Y_i,t-1) will correlate with U_it?

4. Furthermore, is it okay to have the same lag length for the difference/FOD equation and the level equation? Or is there a criterion to decide this?

5. Regarding 1st point of post#9 you mentioned that

"Serial correlation in the error term is often a sign of misspecified dynamics. Adding a second lag of a dependent variable or adding lags of the independent variables as regressors aims to obtain a dynamically complete model, where all the dynamic effects are captured by the right-hand side variables.(Note: This is different from using higher-order lags as instruments. If there is evidence of second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. However, this does not address the potential misspecification of the model dynamics and could lead to weak-instruments problems if those higher-order lags are insufficiently correlated with the regressors.) "

5 B) "Will adding a second lag of a dependent variable or lags of the independent variables take care of misspecification in the model dynamics?"

5 A) As mentioned by you above, if there's a second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. Does this mean that for my model, the instruments for the dependent variable should be "gmm(INT, lag (3 5) eq (level))and gmm(INT, lag (3 5) collapse eq (diff).
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#15

01 May 2024, 05:43

Please ignore the second last post (#13) as I have updated a few more questions in the last post (#14).

Sorry for adding more than one post here. I tried to edit and add this to the last post, but it didn't work.

1. My question is if serial correlation for higher-order orders becomes inconsistent and significantly varies as in the following case, what could be the reason for that?

Arellano-Bond test for AR(1) in first differences: z = -6.42 Pr > z = 0.000
Arellano-Bond test for AR(2) in first differences: z = -0.78 Pr > z = 0.436
Arellano-Bond test for AR(3) in first differences: z = -0.60 Pr > z = 0.546
Arellano-Bond test for AR(4) in first differences: z = 1.68 Pr > z = 0.093
Arellano-Bond test for AR(5) in first differences: z = -2.65 Pr > z = 0.008
Arellano-Bond test for AR(6) in first differences: z = 1.61 Pr > z = 0.107
Arellano-Bond test for AR(7) in first differences: z = 1.83 Pr > z = 0.067
Arellano-Bond test for AR(8) in first differences: z = -2.89 Pr > z = 0.004

2. In the following test, do we need both the excluding group and the difference to be insignificant?

Difference-in-Hansen tests of exogeneity of instrument subsets:

GMM instruments for levels
Hansen test excluding group: chi2(5) = 11.44 Prob > chi2 = 0.043
Difference (null H = exogenous): chi2(67) = 72.25 Prob > chi2 = 0.309
gmm(INT, eq(level) lag(1 3))
Hansen test excluding group: chi2(9) = 29.19 Prob > chi2 = 0.001
Difference (null H = exogenous): chi2(63) = 54.50 Prob > chi2 = 0.769
gmm(INT, collapse eq(diff) lag(2 4))
Hansen test excluding group: chi2(69) = 81.85 Prob > chi2 = 0.138
Difference (null H = exogenous): chi2(3) = 1.84 Prob > chi2 = 0.606
gmm(ROA DivNIITI MQOETA LNTA, collapse lag(2 4))
Hansen test excluding group: chi2(56) = 68.74 Prob > chi2 = 0.118
Difference (null H = exogenous): chi2(16) = 14.95 Prob > chi2 = 0.528

Thank you!
Comment

Announcement

xtabond2 - is it ok to have significant ar(2) and insignificant ar(3) if the gmmstyle lags are limited to third order (3 .)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment