XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Prateek Bedi replied

27 Jul 2020, 12:02
Hi,

I would like to understand why R-Squared is not calculated/reported in:

1. GMM regressions?
2. System-GMM regressions?
3. Dynamic Panel Regressions?

Further, is it meaningful to interpret R-Squared in instrument-variable regressions such as that reported by ivreg2? Also, is it meaningful to interpret R-Squared in random effects model as that reported by xtreg, re?

Thanks!
Leave a comment:
Sebastian Kripfganz replied

27 Jul 2020, 07:28
Whether the coefficient of the lagged dependent variable is statistically significant or not, should usually not be an indicator of whether to accept the model. Otherwise, such an approach would come close to p-hacking.

Among your three models, only the third would raise immediate concerns based on the given information. If there is higher-order serial correlation as indicated by the Arellano-Bond test, this would cause some of the instruments to be invalid. This could possibly be addressed by adding further lags of the dependent variable and the regressors to the model as regressors (not instruments).

I would recommend to have a look at the section on Model Selection in my 2019 London Stata Conference presentation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.
1 like
Leave a comment:
Muhammad Ahmad replied

27 Jul 2020, 07:18
Thank you so much for your reply. Further, I would like to ask another important question;

How much the significance of the dependent variable is important to accept the model?

In another study, I am estimating 3 models (dependent variables have 3 proxies) with GMM and facing the problem.
1. 1st model (lag of dependent variable (Sig) + Sarga-Hansen (Insig) + Arellano-Bond serial correlation (Insig))
2. 2nd model (lag of dependent variable (Insig) + Sarga-Hansen (Insig) + Arellano-Bond serial correlation (Insig))
3. 3rd model (lag of dependent variable (Sig) + Sarga-Hansen (Sig) + Arellano-Bond serial correlation (Sig))

Though estat endog proves that endogenity exists between variables. I tried to overcome the issue by increasing/ decreasing the lag of the dependent/ independent variable but the problem still persists. Please guide should I change my estimation methods?
Leave a comment:
Sebastian Kripfganz replied

27 Jul 2020, 03:15
Your specification and results generally appear all right, assuming that your independent variables are treated as predetermined. If they were endogenous, you should use instruments starting from lag 2 only. If they were strictly exogenous, you could even use lag 0 as an instrument.

Another commonly applied specification test is the Arellano-Bond serial correlation test, estat serial. There, you would want the AR(1) test to reject the null hypothesis and the AR(2), AR(3), ... tests to not reject the null hypothesis.
1 like
Leave a comment:
Muhammad Ahmad replied

27 Jul 2020, 02:09
Dear Sebastian,
I am estimating my research model through two-step differenced GMM. I am a research student from the field of finance not well known with Econometrics.
First, I was trying to estimate my model through xtabond2 and then came to know that xtdpdgmm provides robust results than xtabond2.
ACP- dependent variable
APP, CFV, LEV, CH, TAT, FS, SG are independent and control variables
I am using below command;

Code:

xtdpdgmm L(0/1).ACP APP CFV LEV CH TAT FS SG y_1-y_17, model(diff) collapse gmm(ACP, lag(2 4)) gmm( > CFV LEV CH TAT FS SG , lag(1 3)) iv(y_1-y_17, diff) two vce(r)

below are results

below are results for

Code:

estat overid

Please guide me is it the right command and results are good to interprete? As per my knowledge, results should must show significant lag of dependent variable and insignificant results for hansan-sargan tests. Please guide me Thank you
Leave a comment:
Sebastian Kripfganz replied

22 Jul 2020, 09:53
A new update of xtdpdgmm to version 2.2.7 is now available both on my own website and on SSC (thanks to Kit Baum):

Code:

adoupdate xtdpdgmm, update

Besides some minor bug fixes and improvements under the hood, I reorganized and expanded the Remarks section of the help file, in part to address the feedback from Joseph L. Staats.

For those, who missed the announcement: Mark Schaffer's new underid and overid commands for underidentification and overidentification tests are now on SSC as well. Both of them work as postestimation commands after xtdpdgmm as demonstrated (for underid) in my 2019 London Stata Conference presentation.
1 like
Leave a comment:
Joseph L. Staats replied

16 Jul 2020, 22:21
Sebastian,

Once again, thank you for your reply.

To close out this conversation, I wish to give you a shout-out for your fine work in creating and updating xtdpdgmm. I really enjoy using the program and especially appreciate the flexibility and wide range of options it provides. Thanks also for the many contributions you have made to the Statalist. I have read every one of them and have gained such a better knowledge of GMM because of them.
2 likes
Leave a comment:
Sebastian Kripfganz replied

16 Jul 2020, 01:57
You are right that Arellano and Bover (1995) also propose a system-GMM estimator. In addition to a transformed model equation (first differences or forward-orthogonal deviations), they add a model equation for the average levels. This is essentially just a level equation with time-invariant instruments. What is often unnoticed: A similar system-GMM estimator was already proposed by Arellano and Bond (1991). When I refer to Arellano and Bover (1995) in the xtdpdgmm help file, the focus is on their proposal to use forward-orthogonal deviations for the transformed model equation (whether or not a level equation is added).

Thank you for this valuable feedback. I believe part of the confusion stems from the fact that in xtabond2 the same option is called equation(). I deliberately chose to replace equation() with model() because eventually there is just a single equation estimated. Actually, your initial thought is correct. The model() option eventually refers to a transformation of the instruments, although not in a straightforward way. I tried to illustrate this on slide 33 of my presentation. If you specify level instruments for the first-differenced model equation, those instruments essentially become transformed instruments for the level model equation. (I do not have a good name for this transformation. You might call it transposed differencing because the instruments are not multiplied with a first-difference transformation matrix D but with its transpose D'.) I will have to think about whether I can improve the help file in that regard.
Leave a comment:
Joseph L. Staats replied

15 Jul 2020, 10:11
Sebastian,

Thanks again for your help.

I have two more things to bring into the conversation, if I may, neither of which relates to my specific project.

1. At the top of Slide 36 of your 2019 London Stata Conference presentation you give an example of xtdpdgmm System GMM code that contains two model equations, one where the variables are transformed to first-differences and the instruments are at levels and the other where the variables are at levels and the instruments are transformed to differences. But in the help file that accompanies xtdpdgmm, you give two examples of code, one which you say represents Arellano-Bover two-step and the other Blundell-Bond two-step. As I understand it, Arellano-Bover and Blundell-Bond are both forms of System-GMM. But I only see one equation, not two, in your help-file examples. Am I misunderstanding or misreading something here?

2. The help-file discusses specifying a model, as follows: "model(model) specifies if the instruments apply to the model in levels, model(level), in first differences, model(difference), in deviations from within-group means, model(mdev), or in forward-orthogonal deviations, model(fodev). The default is model(level) unless otherwise specified with the global option model(model)." For quite a while when I first started trying xtdpdgmm, I thought this language was referring to transforming the instruments, not the variables themselves. I arrived at a correct understanding of this definition only after reading your 2019 London Stata Conference presentation, which has, in my opinion, a much clearer discussion/definition than the help-file. I only bring this up because other users and readers of Statalist may also have struggled with understanding how to use the model specification versus the instrument sub-option.
Leave a comment:
Sebastian Kripfganz replied

15 Jul 2020, 05:03
I am afraid this becomes a bit too application-specific. I do not see a problem with GMM in general. It is more a question whether your underlying model is correctly specified, which is something I cannot answer.
Leave a comment:
Joseph L. Staats replied

14 Jul 2020, 12:57
Sebastian,

Thanks for your thoughtful and informative answers to my questions.

1. It seems that I may not have a stationarity issue after all. In looking over my work, I see that I made an error in doing unit-root tests for M-2 and that it actually passes the xtunitroot fisher, dfuller test. And Var-A is a dichotomous categorical variable, so, as I understand it (after looking online for an answer), a unit-root test for this variable doesn't make any sense. By the way, I have 17 years of data, which I suppose is somewhere in between small-t and large-t.

2. Thanks for the suggestion to use a DAG to show the hypothesized causal path.

3. You said you had difficulty imagining using a dynamic model in the context of mediation analysis. I am doing research on the so-called "democratic advantage" for sovereign bond ratings (Moody's and S&P). My hypothesis is that democracy does not directly affect bond ratings but instead (sometimes) sets in motion other changes in the political system that lead to an increase in bond ratings. The particular mediation path is democracy->political competition->judicial power and independence->rule of law->bond rating (Moody's or S&P). I use xtdpdgmm for GMM FOD for each step in the process, and I include a lag of each outcome variable in each step (M-1, M-2, M-3, and Var-B). Of course I have control variables in each step, although most of them only appear in the topmost step where Var-B is the outcome variable. I get good coefficient, standard-error and instrument results using GMM FOB, although I have to use a third-degree polynomial transformation of political completion when it is acting on M-2 (there is theory to support the idea that increased political competition can be good up to a point where it thereafter becomes detrimental). Is there anything about all of this that strikes you as inappropriate for using GMM?

Thanks again for your help.
1 like
Leave a comment:
Sebastian Kripfganz replied

14 Jul 2020, 02:15
If GMM in all of the four steps correctly identifies the respective coefficients, then in principle the product of coefficients methods should be applicable. You would probably need to argue with the help of a DAG that this is the case. The answer depends very much on the specific application and cannot be answered in general, I believe.

Unit-root tests typically require a large time dimension to be reliable. If you believe in the results of your tests, this would raise questions whether a nonstationary variable can cause a stationary variable in your proposed sequence of mediator variables. Models with different integration orders of the dependent and independent variable are typically misspecified unless there are further variables in the model, e.g. lagged dependent variable and lagged independent variable, that help to obtain a stationary error term.

Are we talking about dynamic models with a lagged dependent variable? (I have some difficulties imagining such dynamic models in the context of a mediator analysis as you described in 1.) In such dynamic models, the stationarity condition for system GMM is effectively a condition on the initial observations. In static models, this initial observations problem does not occur. However, the first differences of nonstationary variables may generally be poor instruments for the levels (and vice versa). Because these system GMM estimators are discussed almost entirely in the context of dynamic models with a short time dimension, unit-root tests are usually not considered as they would require a large time dimension.

In the case of a FOD-transformed model only, we would not call it a system GMM estimator. You could possibly call it an FOD-GMM estimator. Nonstationarity could again lead to a poor performance of the estimator as lagged levels likely become weak instruments in the FOD-transformed model.
Leave a comment:
Joseph L. Staats replied

13 Jul 2020, 15:05
Dear Sebastian,

I am using xtdpdgmm in connection with an international political economy project and have some questions for which I don't have answers.

1. I hypothesize that Var-A has an indirect causal effect on Var-B operating successively through three mediator variables, M-1, M-2, and M-3. I have four separate levels of GMM models, the first estimating the effects of Var-A on M-1, then the effects of M-1 on M-2, then M-2 on M-3, and finally M-3 on Var-B. To estimate the coefficient for the indirect effect of Var-A on Var-B, I use the "product of coefficients method," which involves taking the product of the coefficients of interest obtained at each level of the mediation pathway. I arrive at a confidence interval for the indirect-effect coefficient using Monte Carlo simulations as suggested by Selig and Preacher (2008) and Preacher and Selig (2012). I had a manuscript reviewer question whether it was appropriate to use the product of coefficients method in connection with GMM. Can you think of anything in particular about GMM results that would make them inappropriate for using the product of coefficients method in the manner just described?

2. The reviewer also suggested I run unit-root tests on all my predictor and control variables to determine stationarity. I found two variables that failed unit-root tests. One of these is Var-A and the other is M-2. Do I need to do something in or outside of xtdpdgmm to account for this lack of stationarity?

3. I note from an earlier posting you made in Statalist https://www.statalist.org/forums/for...-arellano-bond, and in your 2019 London Stata Conference presentation (Slide 30), you say that lack of stationarity may be a sign that System GMM is not appropriate. Does that have anything to do with the lack of stationarity in my two variables as noted in No. 2 above? I ask because I could not find any recommendation by you or others who have written on stationarity and GMM, most especially Roodman (2009) and Kiviet (2012), suggesting that unit-root tests should be conducted to determine whether to use Difference or System GMM.

4. Also, there doesn't seem to be a bright line between what is meant by Difference and System GMM. I understand that System GMM comes into play if there are two equations, one with outcome and predictor variables transformed to first differences and using untransformed instruments (level) and the other using untransformed outcome and predictor variables (level) with first-differenced instruments. But what about the situation where there is only one equation, outcome and predictor variables that are transformed using the model(fod) option and instruments are untransformed? Is this considered System GMM, and does this trigger the concern with stationarity in GMM I asked about in No. 3 above?

Thanks in advance for your help on this.
Leave a comment:
Sebastian Kripfganz replied

02 Jul 2020, 05:58
1. The negative degrees of freedom indicate that the model is no longer identified after removing the moment conditions under question. To obtain a just-identified model, you would need at least 2 or 11 further moment conditions (i.e. instruments). In other words, you currently cannot test the validity of the overidentifying restrictions implied by the moment conditions 5 or 6.

2. You could possibly use the model and moment selection criteria, estat mmsc. See slides 91 and following of my 2019 London Stata Conference presentation.
1 like
Leave a comment:

Prateek Bedi replied

02 Jul 2020, 05:51

Hi,

I have two doubts as mentioned below.

1. From the output of difference-in-Hansen test shown below, what can we infer? Specially with regard to the last two moment conditions where the degrees of freedom are negative.

Code:

2-step weighting matrix from full model

                  | Excluding                   | Difference                  
Moment conditions |       chi2     df         p |        chi2     df         p
------------------+-----------------------------+-----------------------------
  1, model(fodev) |     0.4063      2    0.8162 |      0.2431      1    0.6220
  2, model(fodev) |     0.0360      1    0.8496 |      0.6134      2    0.7359
  3, model(fodev) |     0.0146      1    0.9038 |      0.6348      2    0.7281
  4, model(fodev) |     0.0654      1    0.7981 |      0.5839      2    0.7468
  5, model(level) |          .     -2         . |           .      .         .
  6, model(level) |          .    -11         . |           .      .         .

2. If there are two competing models, both of which satisfy AR(2) and Sargan-Hansen tests, how do decide which one is better? What are the criteria which we can use to choose one model over the other?

Thanks and Regards

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: