Dynamic panel GMM post-estimation tests

Tomas Heryan

Join Date: Jul 2022

Posts: 3
#1

Dynamic panel GMM post-estimation tests

13 Jul 2022, 03:24

Hello,

I have two questions related to the xtabond, xtdpdsys, or xtdpd post-estimation tests.

First, since the Sargan test of overidentifying restrictions is not available for a two-step estimator with the Windmeijer bias-corrected robust VCE: Do I have to run Sargan test just with one-step estimation if two-step errors are biased, or is it really not necessary to test it with vce(robust)? (robust errors have been recommended by STATA due to their bias with the two-step estimation)

Second, does the output of Arellano-Bond test (for zero autocorrelation in first-differenced errors) have to present no significant evidence of serial correlation in the first-differenced errors at order 2 and higher, or could it be insignificant at the 1st order? In other words, does the first order have to always be significant? I appreciate your help, because it is not clear from the STATA manual or from the examples.

Thanks for All Your Help!
Tom

P.S. I have even studied original papers written by Arellano and Bond (1991), Blundell and Bond (1998) and Windmeijer (2005). However, I still do not have the answer to these questions above.

Last edited by Tomas Heryan; 13 Jul 2022, 03:28.

Tom
Tags: xtabond xtdpdsys xtdpd
Sebastian Kripfganz

Join Date: May 2014

Posts: 2624
#2

13 Jul 2022, 04:52

The one-step Sargan-Hansen test is generally asymptotically invalid. If you have many instruments, it still sometimes may have better finite-sample performance than the two-step test.

Ideally, you should indeed use the two-step test based on corrected standard errors. The two-step test without vce(robust) is still asymptotically valid. If your cross-sectional sample size is large, this should not be a concern.

You can obtain a two-step Sargan-Hansen test based on corrected standard errors by using my xtdpdgmm command instead:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference

Many people just ignore the first-order Arellano-Bond test. However, ideally the first-order test should be statistically significant. Otherwise, this could be an indication of model misspecification.

https://www.kripfganz.de/stata/
1 like
Comment
Tomas Heryan

Join Date: Jul 2022

Posts: 3
#3

13 Jul 2022, 05:54

Thanks for the recommendation! However, I cannot find this paper anywhere (Scopus, WoS, or even your ResearchGate profile). To cite that study, do you think you can send me this STATA conference paper by email? Furthermore, according to Breitung, Kripfganz & Hayakawa (2021), as you have proven in this study, “the BC estimator effectively removes the bias and yields estimates with the lowest RMSE compared to the GMM estimators. The only exception is the case with high persistence and T ≤ 10, where the BB-GMM estimator performs best.” I suppose T ≤ 6 in my case due to the two-step estimator (estimated period 2010-2017).

Tom
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2624
#4

13 Jul 2022, 06:01

There is no paper yet. If you follow the link in my previous post, you can download an extensive set of presentation slides. Alternatively, a recording of a similar presentation can be found on YouTube: https://www.youtube.com/watch?v=5EnPiBMUYE4

The bias-corrected estimator implemented in my xtdpdbc command might be a suitable alternative if all of your regressors (besides the lagged dependent variable) are strictly exogenous. It is often a good idea to run the same model with different estimators and to compare the results.

https://www.kripfganz.de/stata/
Comment
Adam Jacob

Join Date: Oct 2022

Posts: 4
#5

09 Oct 2022, 09:46

Hello Mr Kripfganz,

I am a beginner PhD student. My regression model is dynamic panel data. I have an unbalanced panel data model. My plan is to apply GMM estimation (difference-GMM and system-GMM) using your command 'xtdpdgmm' (thanks for that). I am using Stata 14.2 for Windows. I hope it is OK that I posted my message in this thread, please let me know if my message does not fit into this topic/thread and if it is better that I start a new thread/topic. Please find below my questions.

1) To apply the system-GMM estimation, should the stationarity be satisfied? Or should the non-stationarity be satisfied to apply the system-GMM estimation? And what is the reason behind this condition/moment that makes this condition/moment required to be satisfied to apply the system-GMM estimation?

2) If the stationarity should be satisfied to apply the system-GMM estimation, do all variables have to be stationary? Or does the dependent variable y only have to be stationary? Or do the dependent variable y and the main independent variable have to be stationary?

Also, should the stationarity be satisfied for a variable at level or at difference?

3) If the non-stationarity should be satisfied to apply the system-GMM estimation, do all variables have to be non-stationary? Or does the dependent variable y only have to be non-stationary? Or do the dependent variable y and the independent variable have to be non-stationary?

Also, should the non-stationarity be satisfied for a variable at level or at difference?

4) To apply the system-GMM estimation, can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for stationarity or non-stationarity (unit root)? Can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for the additional assumption/condition of Blundell and Bond?

5) Can the Difference-in-Hansen test be used to check for stationarity or non-stationarity? If so, how can the Difference-in-Hansen test check for stationarity or non-stationarity? What is the null hypothesis of the Difference-in-Hansen test in terms of stationarity or non-stationarity? e.g., the null hypothesis of the popular tests for stationarity or non-stationarity is H₀: All panels contain unit roots.

6) The popular stationarity tests (such as the Augmented Dickey-Fuller test) check for stationarity/non-stationarity for each variable individually. Does the Difference-in-Hansen test check for stationarity/non-stationarity for each variable separately/individually? Or does the Difference-in-Hansen test check for stationarity/non-stationarity for all variables together?

7) Can the Difference-in-Hansen test be used to check for the classification of each variable included in the regression model whether the variable is endogenous, predetermined, or exogenous? If so, how? What is the null hypothesis of the Difference-in-Hansen test in terms of the classification of the variables?

8) Does the Difference-in-Hansen test check for the classification of each variable individually? If so, how? Or does the Difference-in-Hansen test check for the classification of all variables together? If so, how?

9) When I apply the difference-GMM estimation, the coefficient of the independent variable is significant, whereas the coefficient of the independent variable became insignificant when I apply the system-GMM estimation. Thus, I have the following questions:

9.1) Is there any justification for that? i.e., what is the reason behind having an insignificant coefficient of the independent variable when applying the system-GMM estimation, while that coefficient is significant when applying the difference-GMM estimation?

9.2) Is it sufficient to apply the difference-GMM estimation and rely on the difference-GMM findings? Given that the tests of serial correlation and overidentification, corresponding to the difference-GMM estimation, passed.

9.3) Is the system-GMM estimation superior to the difference-GMM estimation even if the tests of serial correlation and overidentification, corresponding to the difference-GMM estimation, passed? i.e., is applying the system-GMM estimation better than the difference-GMM estimation (does the system-GMM estimation outperform the difference-GMM estimation) even when the tests of serial correlation and overidentification, corresponding to the difference-GMM estimation, passed?

10) When applying the system-GMM estimation, is there any need to apply the Difference-in-Hansen test after running the regression of the system-GMM estimation? If so, why? Also, how do we read the findings of the Difference-in-Hansen test corresponding to the system-GMM estimation (what is the interpretation of the outcomes of the Difference-in-Hansen test which is applied after running the system-GMM estimation regression)?

11) To apply the system-GMM estimation, do I have first to apply the difference-GMM estimation i.e., do I have to apply the difference-GMM estimation before applying the system-GMM estimation? Or can I apply the system-GMM estimation directly without any need to apply the difference-GMM estimation? i.e., is there an order of steps to apply the system-GMM estimation?

12) Does the classification of a variable whether it is exogenous, predetermined, or endogenous affect the significance of the variable’s coefficient? i.e., is there any relation between the significance of the variable’s coefficient and the classification of that variable whether it is exogenous, predetermined, or endogenous?

13) Does the number of instruments affect negatively the findings of tests e.g., Hansen test findings? i.e., is there any relation between the findings of the Hansen test and increasing the number of instruments? i.e., when increasing the number of instruments, does that increase the probability of the test not passing?

14) To decide how many lags of the dependent variable y I should include in the regression model as regressors, can I apply the Autoregressive Distributed Lag (ARDL) panel data model? If so, how? I read the ARDL help file, but could not understand whether I have to apply ARDL for the dependent variable y only, or for only both the dependent variable y and the independent variable, or for each variable individually, or for all variables together in one go.

15) To apply GMM estimation (the difference-GMM estimation and the system-GMM estimation), the following questions arise:

15.1) Do we have to exclude all firms with less than 5 consecutive years of data? Or do we have to keep only the firms which have at least 3 continuous time series observations during the research time period?

15.2) Suppose that we have to keep only the firms which have at least 5 consecutive years of data, then, do we have to exclude those firms for each variable (i.e., for all variables) included in the regression model? Or do we have to exclude those firms for only the dependent variable y and the main independent variable?

15.3) Is there a function/command/expression in Stata to perform that exclusion of firms to apply GMM estimation?

16) Suppose that the time period of my research is 1985-2006, and my regression model includes the following variables:
y: is the dependent variable; L.y: is the first lag of the dependent variable y; L2.y: is the second lag of the dependent variable y; k: is the independent variable (k is endogenous); L.k: is the first lag of the independent variable k; w: is a control variable (w is predetermined); s: is a control variable (s is exogenous); inddummy: is the industry dummies (9 industries); coudummy: is the country dummies; yeardummy: is the year dummies; k*w: is the interaction between the independent variable k and the control variable w; coudummy*k: is the interaction between the independent variable k and the country dummies; k*yeardummy: is the interaction between the independent variable k and the year dummies.

Please, I ask you for the codes (in full) that I have to apply regarding the following estimations and commands:

16.1) the difference-GMM estimation using your command ‘xtdpdgmm’;

16.2) the FOD estimation using your command ‘xtdpdgmm’;

16.3) the difference-GMM estimation using ‘xtabond2’ command;

16.4) the system-GMM estimation using your command ‘xtdpdgmm’;

16.5) the system-GMM estimation using ‘xtabond2’ command.

17) If my regression model includes a dummy variable ‘dcb’ which equals 1 for the three years 1995, 1996, and 1997. Also, my regression model includes k*dcb which is the interaction between the independent variable k and the dummy variable ‘dcb’. I ask you please for the codes (in full) that I have to apply regarding the following estimations and commands:

17.1) the difference-GMM estimation using your command ‘xtdpdgmm’;

17.2) the FOD estimation using your command ‘xtdpdgmm’;

17.3) the difference-GMM estimation using ‘xtabond2’ command;

17.4) the system-GMM estimation using your command ‘xtdpdgmm’;

17.5) the system-GMM estimation using ‘xtabond2’ command.

I read the help files, your slides (2019), Roodman (2009) and some posts here regarding the previous two sets of my questions, but I am not sure if I understood correctly, specifically regarding my regression model.

18) I do not know how to create dummy variables for the years/time, the industries, and the countries. Therefore, I ask you please to guide me on how to create/generate the year dummies, the industry dummies, and the country dummies. Also for the dummy variable ‘dcb’ which equals 1 for the three years 1995, 1996, and 1997.

My sincere apologies for the long message.

Kindly help me out!

Thanks and Regards.
Comment

Announcement

Dynamic panel GMM post-estimation tests

Comment

Comment

Comment

Comment