xtabond2 - is it ok to have significant ar(2) and insignificant ar(3) if the gmmstyle lags are limited to third order (3 .)?

Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#16

03 May 2024, 03:11

Originally posted by Minhaj uddin View Post

Hello @Sebastian Kripfganz sir, based on your above suggestion, I have used the following model to estimate system GMM with an orthogonal option.

1.) Could you please help me decide if it is appropriate or not?

xtabond2 INT L.INT L2.INT ROA LNTA LnTASq DivNIITI MQOETA HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA , gmm(INT, lag (1 3) eq (level)) gmm(INT, lag (2 4) collapse eq (diff)) gmm (ROA DivNIITI MQOETA LNTA LnTASq, lag(2 4) collapse ) iv(HHITAFinal Inflation GDP GC CO PV PB c.GC#c.PB c.GC#c.PV c.GC#c.LNTA c.CO#c.PB c.CO#c.PV c.CO#c.LNTA, eq(level)) twostep robust small orthogonal artests(3)

Where :

INT is a dependent variable (predetermined)
ROA LNTA LnTASq DivNIITI MQOETA are firm-specific variables considered endogenous.
HHITAFinal Inflation GDP are macroeconomic variables considered exogenous.
GC is a dummy variable for two years (8th and 9th year, where it takes the value 1; and 0 otherwise)
CO is a one-year dummy (21st year of data)
PV is a dummy for Private sector firm
PB is a dummy for the Public sector firm

By the way my data is unbalanced panel data of 23 years.

2.) Sir what is the logic of starting one lag before in level and FOD equation compared to the difference equation for lag dependent variable, predetermined and endogenous variable?

Let's say that the level form of the GMM equation is Y_it = Y_i,t-1 + X_it + U_it. Then how the Lag(1) of Y as an instrument for the level equation will look like? Will it be (Y_it - Y_i,t-1) or (Y_i,t-1 - Y_i,t-2)?

3. Is it because in the level equation the error term (U_it) is in level form, and the first lag of Y_it is in difference (Y_it - Y_i,t-1)? And since (Y) is a predetermined variable none of the expressions in (Y_it-Y_i,t-1) will correlate with U_it?

4. Furthermore, is it okay to have the same lag length for the difference/FOD equation and the level equation? Or is there a criterion to decide this?

5. Regarding 1st point of post#9 you mentioned that

"Serial correlation in the error term is often a sign of misspecified dynamics. Adding a second lag of a dependent variable or adding lags of the independent variables as regressors aims to obtain a dynamically complete model, where all the dynamic effects are captured by the right-hand side variables.(Note: This is different from using higher-order lags as instruments. If there is evidence of second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. However, this does not address the potential misspecification of the model dynamics and could lead to weak-instruments problems if those higher-order lags are insufficiently correlated with the regressors.) "

5 B) "Will adding a second lag of a dependent variable or lags of the independent variables take care of misspecification in the model dynamics?"

5 A) As mentioned by you above, if there's a second-order serial correlation in the first-differenced errors but no higher-order serial correlation, then the third lag onwards of the dependent variable qualifies as a valid instrument. Does this mean that for my model, the instruments for the dependent variable should be "gmm(INT, lag (3 5) eq (level))and gmm(INT, lag (3 5) collapse eq (diff).

1. With unbalanced panel data, orthogonal deviations are typically recommended compared to first differencing because the latter retains less information if there are gaps in the data. (There is not much of a concern if the data set is only unbalanced because of missing observations at the beginning or end of a time series.)

2. The first lag used as an instrument for the level model is Y(t-1) - Y(t-2).

3. Y(t) - Y(t-1) cannot be a valid instrument because it includes Y(t), which is on the left side of your equation and therefore a function of U(t).

4. There is no widely accepted criterion to determine the maximum lag order for the differenced model. For the level model, typically only the first lag is used without higher-order lags, although further lags are still valid. If you were to use all available lags in the differenced model without collapsing, then the additional lags in the level model would be redundant, which is usually the rational for not using higher-order lags there.

5.B. It might; this is is certainly the researcher's hope, but there is no guarantee that simply adding the second lag takes care of all omitted dynamics.

5.A. For the level model, you can start with lag 2. For the first-differenced model, you would indeed need to start with lag 3. This is because the first-differenced model has errors U(t) - U(t-1); the lagged errors require to go deeper with the lags for the instruments.

Originally posted by Minhaj uddin View Post

1. My question is if serial correlation for higher-order orders becomes inconsistent and significantly varies as in the following case, what could be the reason for that?

2. In the following test, do we need both the excluding group and the difference to be insignificant?

1. There could be different reasons for this. It could genuinely be the case that errors are only correlated over higher-order lags, although this becomes difficult to interpret. It could also be that the second-order test does not reject because of a lack of power or simply by random chance. (Remember that there are both type-1 and type-2 errors possible with statistical testing.) It could also be that the higher-order tests are less reliable due to a small sample size. Last but not least, model/estimator misspecification could also effect the reliability of tests.

2. The difference test is only meaningful if the excluding test does not reject. So, yes, you would like both tests to be statistically insignificant. Some of the excluding tests may not be very reliable if the excluded instruments - i.e., those to be tested - are necessary to achieve identification; in other words, if the remaining instruments are weak after excluding strong instruments, the test might have poor properties.

https://www.kripfganz.de/stata/
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#17

03 May 2024, 04:51

Thank you so much, sir, it has been incredibly helpful! I have a few more questions to ask, so please pardon the bother.

1. In my case the dependent variable is continuous and limited between 0&1. So, can I directly apply the GMM estimation or need some sort of transformation (Like taking log etc.)
I am asking this because some studies have directly used OLS while others have applied Tobit or logit but mainly in static model. Similarly, recently few papers have directly applied GMM on above dependent variable while some have proposed log transformation.

What is your take on that?

2. Related to your 2nd last answer.

I have around around 100 groups and 1600 observations with a time period of 23 years.

Is it necessary to test for higher-order correlation at all? If so, in cases where some higher-order autoregressive terms are significant while others are not, what remedial measures could be taken to address this result?"
3. Related to your last answer. What should be done if the excluding test reject the null hypothesis as in my case?

Thank you!

Last edited by Minhaj uddin; 03 May 2024, 05:50.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#18

03 May 2024, 06:40

Regarding the log transformation, please see https://www.statalist.org/forums/for...nal-regression. For dynamic models or models with endogenous regressors, estimating a nonlinear model can be challenging. While there are shortcomings of a linear model, it is probably the best starting point.

Taken at face value, higher-order serial correlation still invalidates your instruments, and therefore would be reason for concern. In practice, people rarely test for higher-order serial correlation. [This should not be seen as an endorsement from my side.] If you experience higher-order serial correlation, the only remedy I can see would be to add further lags or even other excludes variables to the model. Yet, there is no guarantee that this will eliminate the serial correlation. You might have to ask yourself if there is a theoretical reason for the serial correlation; then you could also search for external (rather than internal/lagged) instruments.

A rejection of the excluding test signals misspecification. This is an overidentification test for the model without the questionable instruments. Thus, some of the remaining instruments might be invalid; or the regression equation might be misspecified. (Residual serial correlation could be such a form of misspecification.)

https://www.kripfganz.de/stata/
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#19

21 May 2024, 04:12

Thanks a lot, Prof. @Sebastian Kripfganz for previous response.

Sir, I need your further help with the following questions:

1. With the orthogonal option, the difference equation is replaced by the forward orthogonal deviation. In such a scenario, where I use the orthogonal option under the xtabond2 command, is it appropriate to use the eq (diff) option?

2. How does the orthogonal option affect the choice of lag for instruments? Like in the case of the difference equation, we start with lag 2 for the dependent variable and endogenous variable and lag 1 for the predetermined variable, so what exactly needs to be done when the orthogonal option is used?

Please help me with the above questions in the following command

xtabond2 INT L.INT ROA CAR LNTA LnTASq Div MQOETA HHITA Inflation GDP FC PVB PSB, gmm (INT, lag (1 3) collapse eq(level)) gmm ( INT, lag (2 4) collapse eq (diff)) gmm (ROA Div MQOETA CAR, lag(2 4) collapse) iv(HHITA LNTA LnTASq Inflation GDP FC, eq(both)) iv (PVB PSB, eq(level)) twostep robust small orthogonal

Where:
INT is the dependent variable
ROA CAR LNTA LnTASq DivNIITI MQOETA are endogenous variables
HHITA Inflation GDP are exogenous variables
FC is a dummy variable that takes the value 1 for 2008 & 2009 and 0 otherwise (time-variant)
PVB and PSB are dummy variables for the public and private sector (time-invariant)

3. In line with the above, is the following code involving interaction terms correct? especially the treatment of c.FC#c.PSB c.FC#c.PVB (interactions between time-variant and invariant, define only for level model), and c.GFC#c.LNTA (interaction between endogenous and time-variant dummy, ).

xtabond2 INT L.INT ROA CAR LNTA LnTASq Div MQOETA HHITA Inflation GDP FC PVB PSB c.FC#c.PSB c.FC#c.PVB c.FC#c.LNTA, gmm (INT, lag (1 3) collapse eq (level)) gmm (INT, lag (2 4) collapse eq (diff)) gmm (ROA Div MQOETA CAR, lag (2 4) collapse) iv (HHITA LNTA LnTASq Inflation GDP FC, eq(both)) iv (PVB PSB c.FC#c.PSB c.FC#c.PVB, eq(level)) iv (c.FC#c.LNTA, eq(both)) twostep robust small orthogonal
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#20

22 May 2024, 13:27

With xtabond2, eq(diff) refers to forward-orthogonal deviations when used with option orthogonal. I am not a big fan of how this is implemented in xtabond2, especially regarding the specification of lags. It always confuses me. Before I give incorrect advice, I recommend to search previous forum posts on this topic. In any case, I also recommend to double check your results with my xtdpdgmm command. If correctly specified, both commands should yield identical results. (Note, however, that the syntax and lag specification is different for the two commands.)

https://www.kripfganz.de/stata/
Comment
Minhaj uddin

Join Date: Dec 2023

Posts: 45
#21

22 May 2024, 15:29

Sure sir I will do the cross-check with xtdpdgmm.

Thank you for the advice.
Comment
Aysun Atil

Join Date: Apr 2023

Posts: 6
#22

17 Jul 2024, 20:39

Hi everyone,

I have a question about appropriate instrument lags in system GMM - xtabond2.
I have included up to three lags of a dependent variable into the regression (yt-1,yt-2,yt-3) because of serial correlation and write the corresponding instruments as gmmstyle(y, lags (2 3) collapse).
The results are ok in every sense p-values of Hansen and AR(2) are respectively 0.35 and 0.07.
I wonder if nstruments lags should be higher than lags of the regressors if we add the higher lags of dependent variables as regressors? Or diagnostics test are enough to evaluate the model is appropriate?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#23

18 Jul 2024, 09:19

Strictly speaking, all you need is that the total number of instruments is at least as large as the total number of regressors. However, it would be advisable to indeed include at least three lags of the specific instruments for the lagged dependent variables if you have 3 lags of them as regressors. Otherwise, there might be a higher chance of running into problems of underidentification or weak instruments.

If you want to rely on diagnostic tests, make sure to also consider underidentification tests; see slides 43 and following of my 2019 London Stata Conference presentation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
Comment
Aysun Atil

Join Date: Apr 2023

Posts: 6
#24

30 Jul 2024, 16:06

Thank you very much. I have read it before, but I will focus on that page now.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment