A collection of bugs in xtabond2, xtabond, xtdpd, xtdpdsys, gmm

Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#1

A collection of bugs in xtabond2, xtabond, xtdpd, xtdpdsys, gmm

10 Oct 2020, 14:49

In my posts here on Statalist, I am frequently referring to some bugs in popular commands for the GMM estimation of linear dynamic panel data models. I decided to compile a list of them to avoid losing track and to have them all in a single place for referencing purposes. For brevity, I do not show output here, but all examples are replicable using publicly available data sets.

1. Forward-orthogonal deviations in xtabond2:
The following two specifications should yield identical results because collapsing of GMM-type instruments is equivalent to using standard instruments. However, the results differ. The estimates from the second specification are incorrect.

Code:

webuse abdata, clear xtabond2 L(0/1).n w, orthogonal gmm(n, lag(2 4)) gmm(w, lag(1 3) collapse) nolevel robust nodiffsargan xtabond2 L(0/1).n w, orthogonal gmm(n, lag(2 4)) iv(L(1/3).w, passthru mz) nolevel robust nodiffsargan

It is hard to come up with a complete list of circumstances under which xtabond2 with option orthogonal produces incorrect results. I recommend to always double check the results, for example with my xtdpdgmm command:

Code:

xtdpdgmm L(0/1).n w, model(fodev) gmm(n, lag(1 3)) gmm(w, lag(0 2) collapse) nocons vce(robust) xtdpdgmm L(0/1).n w, model(fodev) gmm(n, lag(1 3)) iv(w, lag(0 2)) nocons vce(robust)

Notice that the lags in the xtabond2 command lines are shifted by one time period compared to xtdpdgmm. This is a "feature" of xtabond2 that could easily lead to confusion. The second lag of an instrument for the forward-orthogonally transformed model in xtabond2 is actually the first lag of that variable. The latter problem is also present in the official xtdpd command with option fodeviation which applies the same shift by one time period as xtabond2 does.

2. Overidentification tests in xtabond2:
(a) When there are coefficients in the xtabond2 output displayed as "omitted" or "empty", the degrees of freedom of the Sargan/Hansen overidentification tests are incorrect. Consequently, also the p-values are incorrect (too small). The problem is that xtabond2 treats the omitted coefficients as if they were estimated and reduces the degrees of freedom accordingly. This happens frequently when time dummies (or other dummy variables) are specified with the factor variable notation. The following specifications yield identical coefficient estimates but the degrees of freedom and p-values of the overidentification tests are incorrect for the second specification.

Code:

webuse abdata, clear xtabond2 L(0/1).n yr1978-yr1984, iv(yr1978-yr1984, eq(level)) gmm(n, lag(2 4) eq(diff)) robust xtabond2 L(0/1).n i.year, iv(i.year, eq(level)) gmm(n, lag(2 4) eq(diff)) robust

(b) In some situations with a non-default weighting matrix, i.e. h(1) or h(2), xtabond2 reports an instruments count that is too large because it does not detect the perfect multicollinearity among the instruments. The following two specifications yield identical estimates. In the second specifications, the additional time dummies as instruments for the level model are redundant because they are perfectly multicollinear with the time dummy instruments for the first-differenced model. Yet, xtabond2 does not recognize this redundancy and reports 3 instruments too many in the second specification. This has a negative consequence for the degrees of freedom of the overidentification tests which are too large by this amount in the second specification. Hence, also the p-values are too large.

Code:

keep if year > 1977 & year < 1983 xtabond2 L(0/1).n yr1980-yr1982, h(1) iv(yr1980-yr1982, eq(diff)) gmm(n, lag(2 4) eq(diff)) robust xtabond2 L(0/1).n yr1980-yr1982, h(1) iv(yr1980-yr1982, eq(diff)) iv(yr1980-yr1982, eq(level)) gmm(n, lag(2 4) eq(diff)) robust

(c) In the following situation, xtabond2 reports too many degrees of freedom for the Difference-in-Sargan/Hansen test of iv(fem blk), and therefore a p-value that is too large. The correct degrees of freedom are 1 instead of 2. The reason is that after removing the instruments iv(fem blk) from the model, the coefficient of ed is no longer identified.

Code:

webuse psidextract, clear xtabond2 L(0/1).lwage ed, gmm(lwage, lag(2 4) eq(diff)) iv(fem blk, eq(level)) twostep robust

Compare with xtdpdgmm which reports the correct degrees of freedom and p-value.

Code:

xtdpdgmm L(0/1).lwage ed, gmm(lwage, lag(2 4) model(diff)) iv(fem blk, model(level)) twostep vce(robust) overid estat overid, difference

3. Time dummies in xtabond, xtdpd, and xtdpdsys:
In the following specification, the official xtdpd command drops the time dummy for the year 1984 from the list of regressors due to alleged collinearity. However, there actually is no such collinearity problem. Because xtabond and xtdpdsys are just wrappers for xtdpd, the same problem is present in those two commands (not explicitly shown here).

Code:

webuse abdata, clear xtdpd L(0/1).n yr1978-yr1984, dgmm(n, lag(2 4)) liv(yr1978-yr1984) vce(robust)

For the same model specification, xtabond2 and xtdpdgmm correctly do not drop that time dummy.

Code:

xtabond2 L(0/1).n yr1978-yr1984, h(2) gmm(n, lag(2 4) eq(diff)) iv(yr1978-yr1984, eq(level)) robust xtdpdgmm L(0/1).n, teffects wmat(ind) gmm(n, lag(2 4) model(diff)) iv(yr1978-yr1984, model(level)) vce(robust)

4. Unbalanced panels in xtabond2, xtabond, xtdpd, xtdpdsys, and gmm:
The following example shows a problem that can happen in some cases of unbalanced panels, although I believe this is a rare phenomenon. The data set used here can be downloaded from the JAE Data Archive. The two specifications should be equivalent but the second xtabond2 results are incorrect.

Code:

use data_us.dta, clear egen id = group(codeim ind) xtset id year xtabond2 L(0/1).lrfdi, gmm(lrfdi, lag(2 4) eq(diff)) nocons robust xtabond2 L(0/1).lrfdi, gmm(L.lrfdi, lag(1 3) eq(diff)) nocons robust

A similar issue arises with xtdpd (and thus also with xtabond and xtdpdsys). Interestingly, the following two estimates not only differ from each other but also from the xtabond2 results.

Code:

xtdpd L(0/1).lrfdi, dgmm(lrfdi, lag(2 4)) nocons vce(robust) xtdpd L(0/1).lrfdi, dgmm(L.lrfdi, lag(1 3)) nocons vce(robust)

To maximize confusion, the official gmm command yields yet again results that are different from all those before.

Code:

gmm (D.lrfdi - {b} * LD.lrfdi), xtinst(lrfdi, lag(2/4)) inst(, nocons) winit(xt D) onestep vce(robust) gmm (D.lrfdi - {b} * LD.lrfdi), xtinst(L.lrfdi, lag(1/3)) inst(, nocons) winit(xt D) onestep vce(robust)

Compare with the corresponding results from xtdpdgmm, which are identical in both specifications and also equal the first specification of xtabond2.

Code:

xtdpdgmm L(0/1).lrfdi, gmm(lrfdi, lag(2 4) model(diff)) vce(robust) xtdpdgmm L(0/1).lrfdi, gmm(L.lrfdi, lag(1 3) model(diff)) vce(robust)

5. Collinearity among instruments in xtabond2 two-step estimation:
The following example is again a rare phenomenon, and I could not really replicate it with a simpler model. What happens here is that the two-step estimation results change when a redundant instrument (wks_ed4) is added to the second specification. Note that the total number of instruments reported by xtabond2 remains unchanged.

Code:

webuse psidextract, clear forvalues i = 4/17 { gen wks_ed`i' = c.wks#`i'.ed } xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, twostep iv(LD.(wks_ed5-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, twostep iv(LD.(wks_ed4-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan

Note that this problem does not occur with the one-step estimator. The following two results are identical despite the added instrument.

Code:

xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, iv(LD.(wks_ed5-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan xtabond2 L(0/1).lwage wks union wks_ed5-wks_ed17, iv(LD.(wks_ed4-wks_ed17), mz eq(level)) gmm(wks_ed4-wks_ed17, lag(2 3) collapse eq(diff)) gmm(L.lwage wks, lag(1 .) eq(diff)) iv(L.union, passthru eq(diff)) gmm(L.lwage wks, lag(0 0) eq(level)) iv(D.union, eq(level)) nodiffsargan

6. Option diffvars() in xtabond:
Option diffvars() of the official xtabond command adds strictly exogenous regressors to the first-differenced model, together with the respective standard instruments. However, in the regression output those regressors appear as if they were added to the untransformed level model. In the second specification of the following example, the estimated coefficients are correct but predictions with the postestimation command predict would be incorrect.

Code:

webuse abdata, clear xtabond n w, nocons vce(robust) predict xb1 xtabond n, diffvars(D.w) nocons vce(robust) predict xb2 summarize xb1 xb2

This problem exists since Stata 10. Prior to that, xtabond reported the results for the first-differenced model, not the level model. The diffvars() option should have been removed with that change.

Version information:

Code:

. which xtabond2 c:\ado\plus\x\xtabond2.ado *! xtabond2 3.6.3 30 September 2015 *! Copyright (C) 2015 David Roodman . which xtabond C:\Program Files\Stata16\ado\base\x\xtabond.ado *! version 4.2.0 21jun2018 . which xtdpd C:\Program Files\Stata16\ado\base\x\xtdpd.ado *! version 1.6.0 21jun2018 . which gmm C:\Program Files\Stata16\ado\base\g\gmm.ado *! version 2.2.0 30nov2018 . which xtdpdgmm c:\ado\plus\x\xtdpdgmm.ado *! version 2.3.1 08oct2020 *! Sebastian Kripfganz, www.kripfganz.de

Stata version 16.1, update level 29 Sep 2020

Some of these bugs I have mentioned already in my 2019 London Stata Conference presentation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

https://www.kripfganz.de/stata/
Tags: bugs, gmm, panel data, xtabond2, xtdpd

6 likes
David Roodman

Join Date: Jul 2014

Posts: 469
#2

23 Nov 2020, 11:37

I responded to this post at https://www.statalist.org/forums/for...to-do-xtdpdgmm
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#3

26 Nov 2020, 12:40

I can confirm that the specific bug mentioned in point 1 is now fixed in the latest xtabond2 version. Yet, I ran into some further discrepancies between xtabond2 and xtdpdgmm when forward-orthogonal deviations are used. These are discussed in the other thread linked by David in the previous post.

The bug in 2(a) is fixed as well, and xtabond2 now circumvents the issue in 2(c) by no longer reporting Difference-in-Hansen tests in situations where the model becomes underidentified after removing the scrutinized instruments.

Regarding point 3, it turns out that the cause of the problem is that xtdpd drops all regressors from the model that do no pass a collinearity check for either the level or the first-differenced model. This has the consequence that also any time-invariant regressor would be dropped even if its coefficient may be identified given the instruments for the level equation. This is exactly what happens when I try to estimate the model from 2(c) with xtdpd:

Code:

xtdpd L(0/1).lwage ed, dgmm(lwage, lag(2 4)) liv(fem blk) twostep vce(robust)

As David explains in the other thread, the observed behavior of xtabond2 under point 4 is a consequence of gaps in the data set. This should probably not be labelled a "bug" but users might want to be cautious when they use unbalanced panel data with gaps.

Regarding point 5, David nicely explains that this is a consequence of the difficulties in the numerical calculation of the rank of the instrument matrix; so not a bug either.

https://www.kripfganz.de/stata/
1 like
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 216
#4

30 Nov 2020, 07:54

Hello David and Sebastian,

I have been following the thread and have communicated with Sebastian about a couple of the issues he mentioned. I would like to add a couple of comments. One is that we are considering a set of new options to deal with collinearity for point (3). About point (6) Sebastian is correct to point out that when forming the linear prediction we are not using the variable in levels but the differenced variable. This needs to be clearly documented, and we will do this explicitly in a future update.

Finally, in the spirit of Thanksgiving, I would like to thank both of you for your contributions to the Stata community. Your commands, how they are documented, and your responses in this and other forums are something I am grateful for.
3 likes
Comment
Sebastian Kruk

Join Date: Jul 2017

Posts: 72
#5

28 Jun 2022, 15:53

Hi, was it fixed?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#6

29 Jun 2022, 02:51

Originally posted by Sebastian Kruk View Post

Hi, was it fixed?

It depends to which issue you are referring. There has been no further change to the behavior of the commands aside from those fixes mentioned in my post #3 above.

https://www.kripfganz.de/stata/
Comment
Adam Jacob

Join Date: Oct 2022

Posts: 4
#7

22 Oct 2022, 17:24

Hello Mr. Kripfganz,

I am a beginner PhD student. My regression model is dynamic panel data. I have an unbalanced panel data model. My plan is to apply GMM estimation (difference-GMM estimation and system-GMM estimation) using your xtdpdgmm command (thanks for that). I am using Stata 14.2 for Windows. I hope it is OK that I posted my message in this thread, please let me know if my message does not fit into this topic/thread and if it is better that I start a new thread/topic. Please find below my questions.

1) To apply the system-GMM estimation, should the stationarity be satisfied? Or should the non-stationarity be satisfied to apply the system-GMM estimation?

And what is the reason behind this condition/moment that makes this condition/moment required to be satisfied to apply the system-GMM estimation?

2) If the stationarity should be satisfied to apply the system-GMM estimation, do all variables have to be stationary? Or does the dependent variable y only have to be stationary? Or do the dependent variable y and the main independent variable have to be stationary?

Also, should the stationarity be satisfied for a variable at level or at difference?

3) If the non-stationarity should be satisfied to apply the system-GMM estimation, do all variables have to be non-stationary? Or does the dependent variable y only have to be non-stationary? Or do the dependent variable y and the independent variable have to be non-stationary?

Also, should the non-stationarity be satisfied for a variable at level or at difference?

4) To apply the system-GMM estimation, can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for stationarity or non-stationarity (unit root)? Can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for the additional assumption/condition of Blundell and Bond?

5) Can the Difference-in-Hansen test be used to check for stationarity or non-stationarity? If so, how can the Difference-in-Hansen test check for stationarity or non-stationarity? What is the null hypothesis of the Difference-in-Hansen test in terms of stationarity or non-stationarity? e.g., the null hypothesis of the popular tests for stationarity or non-stationarity is H₀: All panels contain unit roots.

6) The popular stationarity tests (such as the Augmented Dickey-Fuller test) check for stationarity/non-stationarity for each variable individually. Does the Difference-in-Hansen test check for stationarity/non-stationarity for each variable separately/individually? If so, how? Or does the Difference-in-Hansen test check for stationarity/non-stationarity for all variables together? If so, how?

7) Can the Difference-in-Hansen test be used to check for the classification of each variable included in the regression model whether the variable is endogenous, predetermined, or exogenous? If so, how? What is the null hypothesis of the Difference-in-Hansen test in terms of the classification of the variables?

8) Does the Difference-in-Hansen test check for the classification of each variable individually? If so, how? Or does the Difference-in-Hansen test check for the classification of all variables together? If so, how?

9) When I apply the difference-GMM estimation, the coefficient of the independent variable is significant, whereas the coefficient of the independent variable became insignificant when I apply the system-GMM estimation. Therefore, I have the following questions:

9.1) Is there any justification for that? i.e., what is the reason behind having an insignificant coefficient of the independent variable when applying the system-GMM estimation, while that coefficient is significant when applying the difference-GMM estimation?

9.2) Is it sufficient to apply the difference-GMM estimation and rely on the difference-GMM findings? Given that the tests of serial correlation and overidentification, corresponding to the difference-GMM estimation, passed.

9.3) Is the system-GMM estimation superior to the difference-GMM estimation even if the tests of serial correlation and overidentification, corresponding to the difference-GMM estimation, passed? i.e., is applying the system-GMM estimation better than the difference-GMM estimation (does the system-GMM estimation outperform the difference-GMM estimation) even when the tests of serial correlation and overidentification, corresponding to the difference-GMM estimation, passed?

10) When applying the system-GMM estimation, is there any need to apply the Difference-in-Hansen test after running the regression of the system-GMM estimation? If so, why? Also, how do we read the findings of the Difference-in-Hansen test corresponding to the system-GMM estimation (what is the interpretation of the outcomes of the Difference-in-Hansen test which is applied after running the system-GMM estimation regression)?

11) To apply the system-GMM estimation, do I have first to apply the difference-GMM estimation i.e., do I have to apply the difference-GMM estimation before applying the system-GMM estimation? Or can I apply the system-GMM estimation directly without any need to apply the difference-GMM estimation? i.e., is there an order of steps to apply the system-GMM estimation?

12) Does the classification of a variable whether it is exogenous, predetermined, or endogenous affect the significance of the variable’s coefficient? i.e., is there any relation between the significance of the variable’s coefficient and the classification of that variable whether it is exogenous, predetermined, or endogenous?

13) Does the number of instruments affect negatively the findings of tests e.g., Hansen test findings? i.e., is there any relation between the findings of the Hansen test and increasing the number of instruments? i.e., when increasing the number of instruments, does that increase the probability of the test not passing?

14) To decide how many lags of the dependent variable y I should include in the regression model as regressors, can I apply the Autoregressive Distributed Lag (ARDL) panel data model? If so, how? I read the ARDL help file, but could not understand whether I have to apply ARDL for the dependent variable y only, or for only both the dependent variable y and the independent variable, or for each variable individually, or for all variables together in one go.

15) To apply GMM estimation (the difference-GMM estimation and the system-GMM estimation), the following questions arise:

15.1) Do we have to exclude all firms with less than 5 consecutive years of data? Or do we have to keep only the firms that have at least 3 continuous time series observations during the research time period?

15.2) Suppose that we have to keep only the firms that have at least 5 consecutive years of data, then, do we have to exclude those firms for each variable (i.e., for all variables) included in the regression model? Or do we have to exclude those firms for only the dependent variable y and the main independent variable?

15.3) Is there a function/command/expression in Stata to perform that exclusion of firms to apply GMM estimation?

16) Suppose that the time period of my research is 1985-2006, and my regression model includes the following variables:

y: is the dependent variable;

L.y: is the first lag of the dependent variable y (L.y is a regressor);

L2.y: is the second lag of the dependent variable y (L2.y is a regressor);

k: is the independent variable (k is endogenous);

L.k: is the first lag of the independent variable k;

w: is a control variable (w is predetermined);

s: is a control variable (s is exogenous);

dcb: is a dummy variable that equals 1 for the three years 1995, 1996, and 1997;

k*dcb: is the interaction between the independent variable k and the dummy variable ‘dcb’;

inddummy: is the industry dummies (9 industries);

regdummy: is the region dummies;

yeardummy: is the year dummies;

k*w: is the interaction between the independent variable k and the control variable w;

k*regdummy: is the interaction between the independent variable k and the region dummies.

Please, I ask you for the codes (in full) that I have to apply regarding the following estimations and commands:

16.1) the conventional difference-GMM estimation using your xtdpdgmm command;

16.2) the unconventional difference-GMM estimation using your xtdpdgmm command;

16.3) the conventional FOD estimation using your xtdpdgmm command;

16.4) the unconventional FOD estimation using your xtdpdgmm command;

16.5) the conventional difference-GMM estimation using ‘xtabond2’ command;

16.6) the unconventional difference-GMM estimation using ‘xtabond2’ command;

16.7) the conventional system-GMM estimation using your xtdpdgmm command;

16.8) the unconventional system-GMM estimation using your xtdpdgmm command;

16.9) the conventional system-GMM estimation using ‘xtabond2’ command;

16.10) the unconventional system-GMM estimation using ‘xtabond2’ command.

I read the help files, your slides (2019), Roodman (2009), and many posts on ‘XTDPDGMM: new Stata command for GMM estimation of linear (dynamic) panel data models’ regarding the previous set of my question, but I am not sure if I understood correctly, specifically regarding my regression model.

17) I do not know how to create dummy variables for the years/time, the industries, and the regions. Therefore, I ask you please to guide me on how to create/generate the year dummies, the industry dummies, and the region dummies and also for the dummy variable ‘dcb’ which equals 1 for the three years 1995, 1996, and 1997.

My sincere apologies for the long message.

Kindly help me out!

Thanks and Regards.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#8

26 Oct 2022, 01:31

Adam Jacob
I am afraid this is a bit too much all at once. Answering the multitude of those questions would be a significant consulting task, which exceeds the usual pro bono advice given by members of this forum. Please feel free to ask questions about a specific issue, and I or others will be happy to help if time permits. And yes, it is generally better to start a new topic unless you are specifically following up on the issues of an existing topic. Thank you for your understanding.

https://www.kripfganz.de/stata/
1 like
Comment
Adam Jacob

Join Date: Oct 2022

Posts: 4
#9

27 Oct 2022, 08:59

Hello Mr. Kripfganz,

Really sorry if my previous long message annoyed you (excuse me, please!). I am a beginner PhD student and recently joined Statalist.

My regression model is dynamic panel data. I have an unbalanced panel data model. My plan is to apply GMM estimation (difference-GMM estimation and system-GMM estimation) using your xtdpdgmm command (thanks for that). I am using Stata 14.2 for Windows. Please find below my questions.

1) Suppose that the time period of my research is 1985-2006, and my regression model includes the following variables:

y: is the dependent variable;

L.y: is the first lag of the dependent variable y (L.y is a regressor);

L2.y: is the second lag of the dependent variable y (L2.y is a regressor);

k: is the independent variable (k is endogenous);

L.k: is the first lag of the independent variable k;

w: is a control variable (w is predetermined);

s: is a control variable (s is exogenous);

dcb: is a dummy variable that equals 1 for the three years 1995, 1996, and 1997;

k*dcb: is the interaction between the independent variable k and the dummy variable ‘dcb’;

inddummy: is the industry dummies (9 industries);

regdummy: is the region dummies;

yeardummy: is the year dummies;

k*w: is the interaction between the independent variable k and the control variable w;

k*regdummy: is the interaction between the independent variable k and the region dummies.

Please, I ask you for the codes (in full) that I have to apply regarding the following estimations and commands:

1.1) the conventional difference-GMM estimation using your xtdpdgmm command;

1.2) the unconventional difference-GMM estimation using your xtdpdgmm command;

1.3) the conventional FOD estimation using your xtdpdgmm command;

1.4) the unconventional FOD estimation using your xtdpdgmm command;

1.5) the conventional difference-GMM estimation using ‘xtabond2’ command;

1.6) the unconventional difference-GMM estimation using ‘xtabond2’ command;

1.7) the conventional system-GMM estimation using your xtdpdgmm command;

1.8) the unconventional system-GMM estimation using your xtdpdgmm command;

1.9) the conventional system-GMM estimation using ‘xtabond2’ command;

1.10) the unconventional system-GMM estimation using ‘xtabond2’ command.

I read the help files, your slides (2019), Roodman (2009), and many posts on ‘XTDPDGMM: new Stata command for GMM estimation of linear (dynamic) panel data models’ regarding the previous set of my question, but I am not sure if I understood correctly, specifically regarding my regression model.

2) I do not know how to create dummy variables for the years/time, the industries, and the regions. Therefore, I ask you please to guide me on how to create/generate the year dummies, the industry dummies, and the region dummies and also for the dummy variable ‘dcb’ which equals 1 for the three years 1995, 1996, and 1997.

Kindly help me out!

Thanks and Regards.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#10

30 Oct 2022, 21:10

You can find several code examples in the following Statalist topic: https://www.statalist.org/forums/for...84#post1675484

Have a look at Stata's factor variable notation regarding the construction of dummy variables:

Code:

help fvvarlist

https://www.kripfganz.de/stata/
Comment
Adam Jacob

Join Date: Oct 2022

Posts: 4
#11

04 Nov 2022, 10:52

Dear Dr. Kripfganz,

Please, I have the following questions.

1) To apply the system-GMM estimation, should the stationarity be satisfied? Or should the non-stationarity be satisfied to apply the system-GMM estimation?

And what is the reason behind this condition/moment that makes this condition/moment required to be satisfied to apply the system-GMM estimation?

2) If the stationarity should be satisfied to apply the system-GMM estimation, do all variables have to be stationary? Or does the dependent variable y only have to be stationary? Or do the dependent variable y and the main independent variable have to be stationary?

Also, should the stationarity be satisfied for a variable at level or at difference?

3) If the non-stationarity should be satisfied to apply the system-GMM estimation, do all variables have to be non-stationary? Or does the dependent variable y only have to be non-stationary? Or do the dependent variable y and the independent variable have to be non-stationary?

Also, should the non-stationarity be satisfied for a variable at level or at difference?

4) To apply the system-GMM estimation, can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for stationarity/non-stationarity (unit root)? Can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for the additional assumption/condition of Blundell and Bond?

5) Can the Difference-in-Hansen test be used to check for stationarity/non-stationarity? If so, how can the Difference-in-Hansen test check for stationarity/non-stationarity? What is the null hypothesis of the Difference-in-Hansen test in terms of stationarity/non-stationarity? e.g., the null hypothesis of the popular tests for stationarity or non-stationarity is H₀: All panels contain unit roots.

6) The popular stationarity tests (such as the Augmented Dickey-Fuller test) check for stationarity/non-stationarity for each variable individually. Does the Difference-in-Hansen test check for stationarity/non-stationarity for each variable separately/individually? If so, how? Or does the Difference-in-Hansen test check for stationarity/non-stationarity for all variables together? If so, how?

Kindly help me out!

Thanks and Regards.

Last edited by Adam Jacob; 04 Nov 2022, 11:05.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#12

21 Jan 2024, 07:29

7. Lag range for GMM-style instruments in the xtabond2 level equation:
There appears to be another unexpected behavior by xtabond2 when using the following code [simplified to provide a minimum working example]:

Code:

webuse abdata xtabond2 n w, gmm(w, lag(0 0))

This creates GMM-style instruments both for the first-differenced and the level model. The help file provides conflicting information about what to expect:

The optional laglimits(a b) suboption can override these defaults: for the transformed equation, lagged levels dated t-a to t-b are used as instruments, while for the levels equation, the first-difference dated t-a+1 is normally used.
[..]
if a<=0<=b or b<=0<=a, the first-difference dated t is used.

For the first-differenced model, we get w itself as instruments (in levels, unlagged). This is uncontroversial.

For the level model, the first quoted sentence from the help file suggests that first-differenced one-period leads F.D.w (equivalent to a negative lag L(-1).D.w) are used, since t-a+1=t+1. However, because a=b=0, the second sentence suggests that contemporaneous first differences D.w are used. To add to the confusion, the xtabond2 output below the regression table incorrectly suggests that the first-differenced lag DL.w was used. It turns out that the first statement from the help file is correct; the above code is equivalent to the following (which might come as a surprise to some people):

Code:

xtabond2 n w, gmm(w, lag(0 0) eq(diff)) gmm(w, lag(-1 -1) eq(level))

Notably, the lag() suboption should normally be equivalent to the use of time-series operators directly in the specified variable list. In this example, this should correspond to the following specification:

Code:

xtabond2 n w, gmm(w, lag(0 0) eq(diff)) gmm(F.w, lag(0 0) eq(level))

However, the results do not coincide. In the latter case, xtabond2 seems to be losing one instrument. I am not sure what it is really doing as I am unable to replicate this specification with xtdpdgmm.

Version information:

Code:

. which xtabond2 c:\ado\plus\x\xtabond2.ado *! xtabond2 3.7.0 22 November 2020 *! Copyright (C) 2003-20 David Roodman. May be distributed free.

https://www.kripfganz.de/stata/
Comment
Thomas Stringham (StataCorp)

StataCorp Employee

Join Date: May 2024

Posts: 4
#13

23 May 2024, 10:12

Thanks to all for raising and discussing these issues. I wanted to follow up on issues 3., 4. and 6. from the original post. The other issues appear to relate only to community-contributed commands.

3.

The dropping of time dummy variables described can be avoided by using the hascons option. Note that xtabond and xtdpdsys apply hascons automatically in their calls to xtdpd.

4.

The discrepancies between some equivalent lag specifications in xtdpd with unbalanced panels has to do with handling of lags of already-lagged variables, and we do not currently intend to change this behavior. David Roodman discussed the issue in a post linked above, and I made some comments in another thread.

The remaining discrepancies between xtdpd and community-contributed commands were caused by a bug in the construction of instruments and have been fixed as of this week:

22. xtabond, xtdpd and xtdpdsys, when gaps were present in a subset of panels, did not account for the gaps when constructing GMM-style lagged instruments in those panels for time periods up to and including the gap, leading to incorrect results. Note that because prior lagged values were substituted for the desired lagged values in these cases, results remained asymptotically valid under commonly used assumptions. This has been fixed.

There was also a different, but related problem in gmm with option xtinstruments, which was also recently fixed.

7b. gmm with option xtinstruments() specified with multiple lags when some panels had gaps at the beginning of the time series returned incorrect results. This has been fixed.

6.

As pointed out by Sebastian, option diffvars() for xtabond no longer makes sense, and it has now been undocumented.
2 likes
Comment

Announcement

A collection of bugs in xtabond2, xtabond, xtdpd, xtdpdsys, gmm

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment