XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Zainab Mariam replied

10 Feb 2023, 05:26
Dear Professor Sebastian,

Thank you so much for your valuable response. How a great supervisor you are! I sincerely appreciate your assistance, Professor! I have the following questions, please!

1) If I have specified ‘model(diff)’ as a separate option in your xtdpdgmm command line, are the following iv() options equivalent or different?

1.1) iv(x, lag( ) diff model(level))

1.2) iv(x, lag( ) diff)

1.3) iv(x, lag( ) model(diff) diff)

1.4) iv(x, lag( ) model(level))

1.5) iv(x, lag( ))

1.6) iv(x, lag( ) model(diff))

1.7) iv(x, lag( ) level)

2) Regarding the meaning of the above iv() options, given I have specified ‘model(diff)’ as a separate option in your xtdpdgmm command line, is the following meaning of iv() options correct?

2.1) iv(x, lag( ) diff model(level)): produces differenced instruments for the level model?

2.2) iv(x, lag( ) diff): produces differenced instruments for the differenced model?

2.3) iv(x, lag( ) model(diff) diff): produces differenced instruments for the differenced model?

2.4) iv(x, lag( ) model(level)): produces level instruments for the level model?

2.5) iv(x, lag( )): produces level instruments for the differenced model?

2.6) iv(x, lag( ) model(diff)): I do not know the meaning of this iv() option, especially since 'model(diff)' has been already specified as a separate option in the command line. I think this iv() option means: produces level instruments for the differenced model? But, I do not think I can include 'model(diff)' twice in the command.

2.7) iv(x, lag( ) level): I do not know the meaning of this iv() option.

3) If I have not specified ‘model(diff)’ as a separate option in the xtdpdgmm command line, will the answers to the above questions (questions 1 and 2) be different? If so, how? Please!

4) If I have specified ‘model(FOD)’ as a separate option in the xtdpdgmm command line, will the answers to the questions above (questions 1 and 2) be different?

Thank you again for your support and effort, Professor! That made a real difference in my understanding.
Leave a comment:
Sebastian Kripfganz replied

10 Feb 2023, 03:12
1) You would just specify iv(dummy, model(fod)) without any further suboption for instrument transformation.

2) You would normally treat such an interaction the same way as any other variable. If one of the variables in the interaction term, the interaction term itself should usually also be treated as endogenous.

3) You would use factor variable notation and specify i.cf in the list of independent variables plus iv(i.cf, model(level)) or iv(i.cf, model(diff) diff) or iv(i.cf, model(fod)), depending on whether you want to use only instruments for the transformed model.

4) This is just an illustration of what the gmm() option really does; it creates standard instruments interacted with time dummies. You would not normally do this yourself manually, but just use the gmm() option.

5) This is strictly speaking not a difference GMM estimator, because it also uses nonlinear moment conditions.
5.1) By default, option teffects always instruments time dummies in the level model, irrespective of what is specified in the model() option. This can be changed by adding option nolevel. Thus, the options model(diff) nolevel teffects would create time dummy instruments for the first-differenced model.
5.2) Option teffects always creates level instruments, irrespective of the model.

6.1) As the option nolevel is not specified, teffects creates instruments for the level model. You can easily see this yourself by looking at the list of instruments displayed below the regression output.
6.2) See 5.2).
Leave a comment:
Zainab Mariam replied

06 Feb 2023, 08:54
Dear Professor Sebastian,

Thank you very much for your beneficial reply. I do appreciate your cooperation, support and patience, professor! If I may follow up with your response, please!

1) Regarding post #508 point 5.1) “For the FOD-transformed equations, you do not need to transform the dummies.”. Sorry, I did not get what you mean by that.

2) My regression model includes the endogenous variable (L.x1) and it also includes the dummy variable (cf) {where this dummy variable cf takes the value of 1 for the 3 years 2008, 2009, 2010}. Also, my regression model includes an interaction between the endogenous variable (L.x1) with the dummy variable (cf). Where: L.x1 is the independent variable and it is endogenous and continuous.

Thus, how do I have to type/express this interaction between the endogenous variable (L.x1) with the dummy variable (cf) in the regression code using your command xtdpdgmm?

3) If I am not using the teffects option, then how do I have to include the time dummies explicitly in my regression model? How do I have to express/type the time dummies explicitly in my regression model code using your command xtdpdgmm? Suppose the research’s time period is 2000-2020.

4) Your code on slide 22 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k, model(diff) iv(i.year#cL(2/4).n) iv(i.year#cL(1/3).w) iv(i.year#cL(0/2).k) nocons two vce(r)

Thus, what do you mean by ‘iv(i.year#cL(2/4).n) iv(i.year#cL(1/3).w) iv(i.year#cL(0/2).k)’? And why to include that in the regression code?

5) The Difference GMM estimator is applied in your code on slide 80 of your 2019 London Stata Conference presentation where your code is: xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) nl(noserial) teffects igmm vce(r)

Thus, I have the following questions, please!

5.1) When you use ‘teffects’, are you instrumenting the year dummies in the differenced model or in the level model?

5.2) Also, are you using the differenced instruments or the level instruments for the year dummies?

6) The FOD estimator is applied in your code on slide 95 of your 2019 London Stata Conference presentation where your code is: xtdpdgmm L(0/3).n L(0/3).(w k ys), model(fod) collapse gmm(n, lag(1 .)) gmm(w, lag(1 .)) gmm(k, lag(1 .)) gmm(ys, lag(1 .)) teffects two vce(r) overid

Thus, I have the following questions, please!

6.1) When you use ‘teffects’, are you instrumenting the year dummies in the differenced model or in the level model?

6.2) Also, are you using the differenced instruments or the level instruments for the year dummies?

Your patience, help and effort are highly appreciated, Professor!
Leave a comment:
Sebastian Kripfganz replied

06 Feb 2023, 03:03
1) In principle, the MMSC can be used for selecting between the difference and system GMM estimator, yes. If different criteria give you different answers, I am afraid then the decision is still up to you. You will then need to weigh the benefits and shortcomings of the two estimators. As mentioned earlier, a good compromise might be the difference GMM estimator plus nonlinear moment conditions (Ahn-Schmidt).

2) gmm(y, lag(2 .)) is equivalent to gmm(L.y, lag(1 .)). As long as you choose the correct lag orders, it does not matter.

3) Yes, this is a binary dummy variable.

4.1) You need to instrument them either in the differenced or the level model. For the differenced model, you would normally also specify those dummies in differenced form and for the level model in level form, in order to maximize the correlation of the instruments with the regressors.

4.2) Yes, whether this is still called a difference GMM estimator is a different question. It is neither the traditional Arellano-Bond difference GMM estimator due to the dummies in the level model, nor the traditional Blundell-Bond system GMM estimator due to the lack of instruments for the other regressors in the level model. I would just call it a GMM estimator and then explain how it is constructed. In my opinion, the terms "difference GMM" and "system GMM" are overused and often lead to confusion. It is often an excuse for not explicitly specifying how an estimator is exactly constructed.

5.1) For the FOD-transformed equations, you do not need to transform the dummies.

5.2) Yes, same as in 4.2).

6) It does not matter; see 2).

7) No, you would need to run separate tests for the system GMM estimator. (The serial correlation test would normally be expected to still pass [although it is not guaranteed to do so], but the overidentification test may reject.) The other way round, your thinking is generally correct [although in finite samples, these tests sometimes do funny things].
Leave a comment:
Zainab Mariam replied

28 Jan 2023, 15:43
Dear Professor Sebastian,

I would like to express my gratitude to you for your valuable response and time. Your cooperation and support are priceless, Professor!

1) Can I use the model and moment selection criteria (MMSC) for the Difference GMM estimator and the System GMM estimator to decide which one of them is better than the other one? If so, what if none of these two estimators (the Difference GMM and the System GMM estimators) has lower values of all criteria {i.e., the Difference GMM estimator has lower values for both the Akaike (AIC) and the Bayesian (BIC), while the System GMM estimator has a lower value for the Hannan-Quinn (HQIC)}?

2) Regarding the first GMM brackets for a dynamic panel data model, to implement the Difference GMM estimator using your xtdpdgmm command, do I have to instrument the dependent variable y itself?
xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag ( ))

Or

Do I have to instrument the regressor L.y (the lagged dependent variable)?
xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(L.y, lag( ))

3) My regression model includes the dummy variable cf that takes the value of 1 for the 3 years 2008, 2009, 2010. Is this dummy variable cf considered as a binary dummy variable which takes only values 1 or 0? this dummy variable cf takes the value of 1 for the 3 years 2008, 2009, 2010, while it takes the value of 0 for the years before 2008 and for the years after 2010.

4) To implement the Difference GMM estimator using your command ‘xtdpdgmm’, I have the following questions, please!

4.1) Is it necessary/required to instrument the dummies {cf, year, industry, and country} in the differenced model? If so, do I have to use the differenced instruments or the level instruments for these dummies in the differenced model?

4.2) Can I instrument the dummies (cf, year, industry, and country dummies) in the level model even though the Difference GMM estimator is applied? If so, do I have to use the differenced instruments or the level instruments for these dummies in the level model?

5) To implement the FOD estimator using your command ‘xtdpdgmm’, I have the following questions, please!

5.1) Is it necessary/required to instrument the dummies (cf, year, industry, and country dummies) in the differenced model? If so, do I have to use the differenced instruments or the level instruments for these dummies in the differenced model?

5.2) Can I instrument the dummies (cf, year, industry, and country) in the level model even though the FOD estimator is applied? If so, do I have to use the differenced instruments or the level instruments for these dummies in the level model?

6) For unbalanced panel data, is it better to instrument the dependent variable y itself or to instrument the lagged dependent variable (i.e., the regressor L.y)?

7) If the serial correlation and overidentification tests corresponding to the Difference GMM estimator passed, does it indicate that the System GMM estimator can be applied and its corresponding tests of serial correlation and overidentification will pass?

Is the other way round correct i.e., if the serial correlation and overidentification tests corresponding to the System GMM estimator passed, does it mean that the Difference GMM estimator can be applied and its corresponding tests of serial correlation and overidentification will pass?

I am very grateful to you for all your patience, help and effort, Professor!
Leave a comment:
Sebastian Kripfganz replied

28 Jan 2023, 06:08
1.1) The test in the "Excluding" column is a Hansen test for a model without the respective instruments. Here, for the last row labeled "model(level)", this would be a model without any of the instruments specified for the level model. In essence, this becomes a Hansen test for the difference GMM estimator. Passing this test is a prerequisite for conducting the difference-in-Hansen test for the additional level instruments. Thus, you can then move on to the "Difference" column.

1.2) If the "Excluding" test does not pass, then the "Difference" test becomes meaningless because it compares the results with the additional level instruments to a misspecified benchmark model. In this case, you would need to think about changing the regression model or the instruments for the differenced model before you can evaluate the instruments for the level model.

2.1) If the "Difference" test passes, assuming the "Excluding" test was passed as well, then there is no evidence of a violation of the additional system GMM assumption; in other words, there is no evidence that the additional instruments for the level model are invalid. Thus, you can go ahead and interpret the system GMM regression results.

2.2) If the "Difference" test is rejected, again assuming the "Excluding" test was passed before, then there is evidence that the additional instruments for the level model are invalid. You might have to remove some or all of them to obtain a consistent estimator.

3) Eventually, the "Difference" test is the relevant test for the Blundell-Bond assumption.

4) With the difference GMM estimator, the difference-in-Hansen test can still be useful to evaluate the validity of specific instrument sets. This could for example help to decide whether variables should be classified as endogenous, predetermined, or exogenous; see the model selection section of my presentation.

5) The reliability of the difference GMM estimator depends on the true autoregressive coefficient is unknown; but this true coefficient is unknown.

6) First of all, ask yourself if there are any theoretical arguments in favor or against the system GMM assumption. If there are no such theoretical arguments against it, you can then use the difference-in-Hansen test.
Leave a comment:
Zainab Mariam replied

25 Jan 2023, 08:48
Dear Professor Sebastian,

I extend infinite thanks and gratitude for your valuable reply and time, professor! Please, if I may follow up with your response!

1) If we check the last row labelled “model(level)” in the “Excluding” column of the outcomes table of the difference-in-Hansen test, I have the following questions, please!

1.1) What does it mean if this test passes with a sufficiently high p-value? And what to do if this test passes?

1.2) What does it mean if this test does not pass because the p-value is small? And what to do if this test does not pass?

2) When we move on to the “Difference” column of the last row labelled “model(level)” in the outcomes table of the difference-in-Hansen test, I have the following questions, please!

2.1) What does it mean if the p-value is high? And what to do if this p-value is high?

2.2) What does it mean if the p-value is small? And what to do if this p-value is small?

3) Is the column headed “Excluding” the one which is responsible to show if the variables satisfy/violate the additional Blundell-Bond assumption (sufficient: mean stationarity)? Or is the column headed “Difference” the one which is responsible to show if the variables satisfy/violate the additional Blundell-Bond assumption (sufficient: mean stationarity)?

4) If the Difference GMM estimator is applied, do I still need to perform the Difference-in-Hansen test? If so, why? i.e., what is the implication of (the rationale behind) performing the Difference-in-Hansen test when the Difference GMM estimator is applied?

5) Regarding your post #504 point 3) “… A low estimate of the autoregressive coefficient based on the difference GMM estimator does not by itself provide confidence that the true coefficient is indeed low as well, precisely because a low estimate might be a consequence of strong bias when the true coefficient is large …”.

Thus, my question is: do you mean that the true coefficient is the lagged dependent variable’s coefficient which is obtained by running the independent variables included in the regression model using your command xtdpdgmm to apply the difference GMM estimator?

6) Regarding your post #504 point 3) “… If you are confident that the system GMM assumptions are satisfied…”.

Thus, my question is: How to check that the system GMM assumptions are satisfied?

I am very grateful to you for all your help and effort, and I do appreciate your cooperation, support and patience, professor!
Leave a comment:
Sebastian Kripfganz replied

25 Jan 2023, 05:08
1) I understand your confusion. This aspect can become quite technical when we look at the details. First of all, mean stationarity is a sufficient condition for the system GMM validity, but not a necessary one. In the simplest possible AR(1) model with no independent variables, an autoregressive coefficient equal to 1 implies nonstationarity of the dependent variable. Neither the difference nor the system GMM estimator work in this extreme case. However, for values of the autoregressive coefficient close to 1 (but below 1, and therefore in the stationary region), the difference GMM estimator still breaks down due to weak instruments, while the system GMM estimator will perform much better; see Blundell and Bond (1998, Journal of Econometrics). With additional independent variables, things become a bit more complicated because the stationarity properties of the dependent variable also depend on the stationarity of the independent variables (and not just the value of the autoregressive parameter).

2) You could use the underid command to test for underidentification, which is closely related to weak instruments; see slides 43 and following of my presentation.

3) This is a tricky part. A low estimate of the autoregressive coefficient based on the difference GMM estimator does not by itself provide confidence that the true coefficient is indeed low as well, precisely because a low estimate might be a consequence of strong bias when the true coefficient is large. If you are confident that the system GMM assumptions are satisfied (which does not depend on the specific value of the autoregressive coefficient as long as it is smaller than 1), you could check the estimate from the system GMM estimator. Alternatively, you could use the nonlinear Ahn and Schmidt (1995, Journal of Econometrics) estimator, which also mitigates the weak-instruments problem but does not require the additional system GMM assumptions.
Leave a comment:
Zainab Mariam replied

24 Jan 2023, 15:56
Dear Professor Sebastian,

Many thanks for your swift useful reply. Your cooperation and support are invaluable, Professor!

1) Please, correct me if I am wrong! What I know is that the Difference GMM estimator has a problem with weak instruments if the time series is persistent (i.e., the time series is near unit root/random walk) and the dataset is short. Thus, the System GMM estimator should be applied instead. In other words, the System GMM estimator should be applied when there is no stationarity (i.e., we cannot rely on the findings of the Difference GMM estimator if the time series is not stationary, and thus we should apply the System GMM estimator). Am I wrong or right?

2) Also, how can I check that the Difference GMM estimator has a problem with weak instruments?

3) How to check if the time series is persistent (i.e., the time series is near unit root/random walk)?

Your patience, cooperation and help are highly appreciated, Professor!
Leave a comment:
Sebastian Kripfganz replied

24 Jan 2023, 12:24
1) The system GMM estimator requires that the initial deviations from the long-run mean should not be systematically related to that long-run mean. In other words, groups which are initially further away from their steady state should not systematically experience higher growth rates. In practical terms, groups with large fixed effects should not systematically experience larger or smaller changes of the relevant variables over time. Mean stationarity ensures that this is not the case. In practice, it can however often be argued theoretically that this assumption is violated (even though too often in the empirical practice such theoretical concerns are ignored).

2) To use these instruments for the lagged dependent variable, this should be satisfied for all variables. But you might otherwise still be able to selectively use instruments for some independent variables if the assumption is satisfied for them (as I have done in the model selection example in my presentation). Stationarity needs to be satisfied for the levels.

3) N/A

4) Separate stationarity tests are usually not applicable due to the short time horizon in typical data sets when we use these GMM methods. However, they are also needed. If there are no strong theoretical arguments already ruling out the mean stationarity assumption, we can simply use the Difference-in-Hansen test as a validity check.

5.1) Yes. If the required assumption (mean stationarity) is violated, the respective instruments will be invalid. This is precisely what the Difference-in-Hansen test checks. The null hypothesis is that the respective instruments are valid (which requires mean stationarity to hold). The alternative hypothesis is that they are not valid (which could be due to a violation of the mean stationarity assumption or due to any other model misspecification).

5.2) You could in principle check for each variable separately by only including the instruments for the respective variable in the level equation and leaving out all the instruments for all other variables in the level equation.

6.1) Yes, please check the model selection section in my presentation.

6.2) Again, please check the model selection section in my presentation.
Leave a comment:
Zainab Mariam replied

24 Jan 2023, 09:37
Dear Professor Sebastian,

Thank you very much for your swift valuable reply. You are a magnificent supervisor. It is not a compliment; it is the truth. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

1) To apply the System GMM estimator, should the stationarity be satisfied? Or should the non-stationarity be satisfied to apply the System GMM estimator?

And what is the reason/rationale behind this condition/moment that makes this condition/moment required to be satisfied to apply the System GMM estimator?

2) If the stationarity should be satisfied to apply the System GMM estimator, do all variables have to be stationary? Or does the dependent variable y only have to be stationary? Or do both the dependent variable y and the main independent variable have to be stationary?

Also, should the stationarity be satisfied for a variable at level or at difference?

3) If the non-stationarity should be satisfied to apply the System GMM estimator, do all variables have to be non-stationary? Or does the dependent variable y only have to be non-stationary? Or do both the dependent variable y and the independent variable have to be non-stationary?

Also, should the non-stationarity be satisfied for a variable at level or at difference?

4) To apply the System GMM estimator, can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for stationarity/non-stationarity? Can the popular stationarity tests (such as the Augmented Dickey-Fuller test) be used to check for the additional assumption/condition of Blundell and Bond?

5) Regarding the Difference-in-Hansen test, I have the following questions, please!

5.1) Can the Difference-in-Hansen test be used to check for stationarity/non-stationarity? If so, how can the Difference-in-Hansen test check for stationarity/non-stationarity? What is the null hypothesis of the Difference-in-Hansen test in terms of stationarity/non-stationarity? e.g., the null hypothesis of the popular tests for stationarity/non-stationarity is H₀: All panels contain unit roots.

5.2) The popular stationarity tests (such as the Augmented Dickey-Fuller test) check for stationarity/non-stationarity for each variable individually. Thus, does the Difference-in-Hansen test check for stationarity/non-stationarity for each variable separately/individually? If so, how? Or does the Difference-in-Hansen test check for stationarity/non-stationarity for all variables together? If so, how?

6) For the classification of a variable whether it is exogenous, predetermined or endogenous, I have the following questions, please!

6.1) Can the Difference-in-Hansen test be used to check for the classification of the variables included in the regression model whether the variable is endogenous, predetermined or exogenous? If so, how? And what is the null hypothesis of the Difference-in-Hansen test in terms of the classification of the variables?

6.2) Does the Difference-in-Hansen test check for the classification of each variable individually? If so, how? Or does the Difference-in-Hansen test check for the classification of all variables together? If so, how?

Your help, patience and cooperation are highly appreciated, Professor!
Leave a comment:
Sebastian Kripfganz replied

24 Jan 2023, 03:58
1.1) If you change the estimator, coefficients can turn from insignificant to significant or from significant to insignificant. If there was no possibility of such a change, then we would not need to think about different estimators in the first place. It could possibly be that some of the additional assumptions of the system GMM estimator are violated, consequently biasing the coefficient estimates.

1.2) It is difficult to give a general answer to this question. If the tests are passed and there is no indication of potentially weak instruments, then the difference GMM estimator should be sufficient and more robust, as it does not rely on the additional assumptions for the system GMM estimator. The motivation for the system GMM estimator is typically that it may help overcome a potential weak-instruments problem of the difference GMM estimator.

1.3) The system GMM estimator relies on additional assumptions (essentially mean stationarity of all variables). If those assumptions are violated, then the estimator is biased/inconsistent and thus likely worse than the difference GMM estimator. If those assumptions are satisfied, then the system GMM estimator uses stronger instruments and typically performs better than the difference GMM estimator.

2) The Difference-in-Hansen test can help to assess the validity of the additional assumptions needed for the system GMM estimator. So, yes, it generally should be checked. A rejection of the difference-in-Hansen test comparing the difference and system GMM estimates tells us that those additional assumptions are likely violated (assuming that the difference GMM estimator used for the comparison is correctly specified in the first place).

3) There are two asymptotically equivalent ways of computing the difference-in-Hansen test for this purpose. (i) You can compute the difference GMM and system GMM estimator separately and then contrast the two; see slide 49 of my 2019 London Stata Conference presentation. (ii) Alternatively, you can just compute the system GMM estimator and compute a difference-in-Hansen statistic directly from there; see the subsequent slides in my presentation.

4) Possibly, yes. If you make stronger assumptions (say, strictly exogenous regressor), this can bias the coefficient estimates if this assumption is incorrect; if the bias is towards zero, it can mean that the estimate turns insignificant. If you make weaker assumptions (say, endogenous regressor), the respective instruments become weaker, which generally increases the standard errors and thus can again lead to less significant results. As you can see, there is no automatism that guarantees you significant results. Statistical significance of coefficients should not be used to choose a particular estimator or model specification; if we adjust our estimator until we get the desired result, then why to do a statistical analysis in the first place?

5) Possibly, yes. If there are (too) many instruments, this can lead to overfitting of the endogenous regressors and difficulties with estimating the optimal weighting matrix. One of the consequences could be unreliable results from the Hansen test. Again, there is no automatism, although if you are using an extremely large number of instruments, it is likely that the p-value of the Hansen test might be biased towards 1.

6.1) The ARDL panel data model is simply a model in which you include a lagged dependent variable (or possibly several lags of it) and possibly lags of the independent variables. Model selection criteria could be used to choose between model specifications; see the section on model selection in my presentation. The reason for using lags of the variables is to obtain a dynamically complete model. In dynamically incomplete models, some of the dynamic adjustment processes are left unexplained, which could lead to serial correlation in the error term (which in turn might invalidate some of the instruments). Thus, if there is evidence of residual serial correlation by the Arellano-Bond test, adding lags of the variables could be a promising approach for dealing with that.

6.2) There is no general answer to this question. It depends on whether your theory suggests that there might be delayed effects of the independent variables on your dependent variables. And you can use the empirical approach suggested in 6.1).

7.1) You do not necessarily have to exclude any firms. If there are insufficient observations for a firm, the command will not use that firm automatically. However, sometimes the performance of the estimator might be better if the panel is not heavily unbalanced. Also, ask yourself whether those firms with few observations are systematically different from your other firms. If they are, then maybe they are not representative for your target population of firms, and you may want to exclude them.

7.2) You cannot selectively use some variables for some firms. If data is missing for some variables and firms, the whole observation - i.e., all variables - won't be used.

7.3) You can use the if condition in the command syntax.
Leave a comment:
Zainab Mariam replied

23 Jan 2023, 16:07
Dear Professor Sebastian,

Many thanks for your beneficial response. Professor, you really deserve to mention your name as a supervisor. Your help is much more than my supervisors’. Your support is a main contributor to the empirical part of my thesis. I still have the following questions, please! Sorry!

1) The coefficient of the independent variable is significant if the Difference GMM estimator is applied using your command xtdpdgmm, while the coefficient of the independent variable became insignificant when the System GMM estimator is applied. Thus, I have the following questions, please!

1.1) Is there any justification for that? i.e., what is the reason behind having an insignificant coefficient of the independent variable when the System GMM estimator is applied, while that coefficient is significant when the Difference GMM estimator is applied?

1.2) Is it sufficient to apply the Difference GMM estimator and rely on the Difference GMM findings? Given that the tests corresponding to the Difference GMM estimator (i.e., the tests of serial correlation and overidentification) passed.

1.3) Is the System GMM estimator superior to the Difference GMM estimator even if the serial correlation and overidentification tests, corresponding to the Difference GMM estimator, passed? i.e., is applying the System GMM estimator better than the Difference GMM estimator (does the System GMM estimator outperform the Difference GMM estimator) even if the tests corresponding to the Difference GMM estimator (i.e., the tests of serial correlation and overidentification) passed?

2) When the System GMM estimator is applied using your command xtdpdgmm, is there any need to apply the Difference-in-Hansen test after running the regression of the System GMM estimator? If so, why? Also, how do I read the findings of the Difference-in-Hansen test corresponding to the System GMM estimator (what is the interpretation of the outcomes of the Difference-in-Hansen test which is applied after running the System GMM estimator regression)?

3) To apply the System-GMM estimator using your command xtdpdgmm, do I have first to apply the Difference GMM estimator i.e., do I have to apply the Difference GMM estimator before applying the System GMM estimator? Or can I apply the System GMM estimator directly without any need to apply the Difference GMM estimator? Is there an order of steps to apply the System GMM estimator?

4) Does the classification of a variable whether it is exogenous, predetermined, or endogenous affect the significance of the variable’s coefficient? i.e., is there any relation between the significance of the variable’s coefficient and the classification of that variable whether it is exogenous, predetermined, or endogenous?

5) Does the number of instruments affect negatively the findings of tests e.g., Hansen test findings? Is there any relation between the findings of the Hansen test and increasing the number of instruments? When increasing the number of instruments, does that increase the probability of the test not passing?

6) Regarding the Autoregressive Distributed Lag (ARDL) panel data model mentioned on slide 9 of your 2019 London Stata Conference presentation, I have the following questions, please!

6.1) Why to apply the ARDL panel data model? And how to apply the ARDL panel data model?

6.2) Do I have to apply ARDL for the dependent variable y only, or for only both the dependent variable y and the independent variable, or individually for each variable included in the regression model, or for all variables together in the same code?

7) To apply GMM estimation (the Difference GMM estimator and the System GMM estimator), I have the following questions, please!

7.1) Do I have to exclude all firms with less than 5 consecutive years of data? Or do I have to keep only the firms that have at least 3 continuous time series observations during the research time period?

7.2) Suppose that I have to keep only the firms that have at least 5 consecutive years of data, then, do I have to exclude those firms for each variable (i.e., for all variables) included in the regression model? Or do I have to exclude those firms for only the dependent variable y and the main independent variable?

7.3) Is there a function/command/expression in Stata to perform that exclusion of firms to apply GMM estimation?

Your patience, support and effort are highly appreciated, Professor!
Leave a comment:
Sebastian Kripfganz replied

23 Jan 2023, 09:38
1.1) That's a system GMM estimator.
1.2) N/A
1.3) I believe, incremental overidentification tests may not have supported the inclusion of these instruments for the level model.

2.1) Strictly speaking, it is a system estimator, although it only uses the instruments for the dummy variables in the level model. This example demonstrates why the terms "difference GMM" and "system GMM" can be misleading. Different people might have different things in mind when using these terms. Additionally, with the nl(noserial) option, we are not looking at the traditional difference/system GMM estimator anymore anyway.
2.2) Instruments for such dummy variables achieve the strongest correlation with the instrumented dummies if the instruments are specified for the level model. Those dummies are typically assumed to be uncorrelated with the idiosyncratic and the group-specific error components, which is often innocuous given that those coefficients are not assigned a structural interpretation. Often, additional instruments for them in the first-differenced model are then redundant.
2.3) The additional assumptions for the validity of those instruments in the level model may not be satisfied.

3.1) Same answer as in 2.1).
3.2) Same answer as in 2.2).
3.3) Same answer as in 2.3).

4.1) Here, it is just a difference GMM estimator.
4.2) You can also use the level time dummies; it should not matter. Generally, in the first-differenced model the correlation between first-differenced regressors and instruments is maximized if the latter are differenced as well.

5) This is an example of what should not be done. In a balanced panel model, the instruments for the time dummies in the first-differenced model are redundant once the respective instruments are specified for the level model.

6.1) That is a system GMM estimator.
6.2) The option model(diff) indicates that the variables are specified for the differenced model. In the absence of another transformation option, no transformation is applied (i.e., level instruments for differenced model).

7) The two codes are equivalent.

8) The two codes are equivalent.
Leave a comment:
Zainab Mariam replied

10 Jan 2023, 15:10
Dear Professor Sebastian,

Many thanks for your crystal clear answers. I do not know how to thank you, Professor! Indeed, saying “thank you very much” is not enough. Your cooperation and support are priceless. You are an invaluable source of information.
Please, if I may follow up with your response!

1) Your code on slide 118 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/2).n L(0/2).w k L(0/3).ys c.w#c.w c.w#c.k, model(fod) collapse gmm(n, lag(1 .)) gmm(w, lag(0 .)) gmm(k, lag(0 .)) gmm(ys, lag(1 .)) gmm(c.w#c.w, lag(0 .)) gmm(c.w#c.k, lag(0 .)) gmm(k, lag(0 0) model(md)) gmm(w k, lag(0 0) diff model(level)) teffects two vce(r) overid

I have the following questions, please!

1.1) Are you applying the Difference GMM estimator or the System GMM estimator?

1.2) If the System GMM estimator is not applied, why to instrument the variables w and k in the level model?

1.3) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables ys, w squared, and the interaction between w and k in the level model?

2) Your code on slide 86 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k i.ind, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(i.ind, model(level)) nl(noserial) teffects igmm vce(r)

2.1) Are you applying the Difference GMM estimator or the System GMM estimator?

2.2) If the Difference GMM estimator is applied, why to instrument the industry dummies ‘ind’ in the level model? And why to use the level instruments for the industry dummies in the level model?

2.3) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables w and k in the level model? And why to use the level instruments for the industry dummies in the level model i.e., why not using the differenced instruments for the industry dummies in the level model?

3) Your code on slide 78 of your 2019 London Stata Conference presentation is: quietly xtdpdgmm L(0/1).n w k yr1978-yr1984, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1978-yr1984, model(level)) two vce(r)

3.1) Are you applying the Difference GMM estimator or the System GMM estimator?

3.2) If the Difference GMM estimator is applied, why to instrument the year dummies in the level model? And why to use the level instruments for the year dummies in the level model?

3.3) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables w and k in the level model? And why to use the level instruments for the year dummies in the level model?

4) Your code on slide 75 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) two vce(r)

4.1) Are you applying the Difference GMM estimator or the System GMM estimator?

4.2) If the Difference GMM estimator is applied, why to use the differenced instruments for the year dummies in the differenced model i.e., why not using the level instruments for the year dummies in the differenced model? What I know regarding applying the Difference GMM estimator is to use the level instruments in the differenced model.

5) Your code on slide 77 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k yr1980-yr1982, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) iv(yr1980-yr1982, diff) iv(yr1980-yr1982, model(level)) two vce(r)

5.1) Are you applying the Difference GMM estimator or the System GMM estimator?

5.2) If the Difference GMM estimator is applied, why to instrument the year dummies in the level model? And why not using the level instruments for the year dummies in the differenced model?

5.3) If the System GMM estimator is applied, why to instrument the year dummies in the differenced model? And why not using the differenced instruments for the year dummies in the level model?

5.4) If the System GMM estimator is applied, why not instrumenting the dependent variable n, the variables w and k in the level model?

6) Your code on slide 38 of your 2019 London Stata Conference presentation is: xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff) two vce(r)

6.1) Are you applying the Difference GMM estimator or the System GMM estimator?

6.2) When you write ‘gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff))’, are you instrumenting the variables n, w, k in the differenced model or in the level model i.e., are you instrumenting the differenced model or the level model? Also, are you using the differenced instruments or the level instruments for the variables n, w, k?

7) Are the following codes equivalent or different?

xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) two vce(r)

xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)

8) Are the following codes equivalent or different?

xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff) gmm(w k, lag(0 0) diff) two vce(r)

xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4) model(diff)) gmm(w k, lag(1 3) model(diff)) gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)

I am very grateful to you for all your support and effort, professor!
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: