XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Zainab Mariam

Join Date: Jul 2022

Posts: 51
#466

16 Aug 2022, 18:17

Dear Professor Sebastian,

Many thanks for your response. I still have the following questions, please!

A) You kindly mention in your third point: “You would add the dummies as additional regressors”. Thus, my question is:

1) Do you mean that I do have to or (do not have to) include the dummies in the baseline regression model?

B) Also, you mention in your third point: “You would add the dummies as additional regressors and as instruments in an iv() option”. Indeed, I will include industry dummy variables in my regression model. Thus, my question is:

2) Which of the following codes is correct and I can use to implement the Difference Gmm estimator?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> iv(i.ind, model(level)) teffects two vce(r)

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> iv(i.ind) teffects two vce(r)

If none of the previous codes is correct, what is the correct code I have to use in order to implement the Difference GMM estimator, given that the regression model includes industry and time dummies?

C) Moreover, you mention: “For time dummies, you can alternatively simply combine the two options teffects and nolevel”. Thus, my questions are:

3) Here, what does ‘nolevel’ stand for? i.e., what does ‘nolevel’ serve here?

4) For time dummies, do you mean that I can use the following code to implement the Difference GMM estimator?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0)) ///
> nocons teffects nolevel two vce(r)

D) In some of your slides, the option ‘teffects’ is accompanied with ‘igmm’ in the same code, but not accompanied with ‘nocons’ in the same code. Thus, my questions are:

5) Does this mean it is wrong to use ‘teffects’ with ‘nocons’? i.e., is it better not to use ‘teffects’ with ‘nocons’?

6) Is it better to use ‘teffects’ with ‘igmm? i.e., is it wrong not to use ‘teffects’ with ‘igmm’?

7) Is it wrong to use the four options ‘nocons’ ‘teffects’ ‘nolevel’ ‘two’ in the same code?

Your help, patience and cooperation are appreciated.
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#467

19 Aug 2022, 05:19

Eagerly waiting for a response, please!

Sorry for any inconvenience.

Thank you in advance.
Comment
Hamid muili

Join Date: Aug 2020

Posts: 94
#468

19 Aug 2022, 16:19

After estimating my difference gmm using the xtdpdgmm I was only able to estimate the estat. overid but the estat serial gives error
Non class found where class required..pls what the problem.. Also the Hansen test which one is to be interpreted.. The 2 step weighting matrix or 3 step weighting matrix
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2589
#469

24 Aug 2022, 05:51

Zainab Mariam
1) If you are not using the teffects option, then you do have to include the time dummies explicitly in your regression model.
2) Both codes are generally correct. In a strict sense, for a difference-GMM estimator you would use the second code (plus option nolevel). However, if the industry dummies are time-invariant, then you have to use the first code because they would otherwise be omitted.
3) For option nolevel, please see my post #374 in this thread.
4) Yes.
5) If you combine teffects and nocons, the command will add an additional time dummy instead of the intercept. This changes the interpretation of the time dummies but leaves everything else unchanged.
6) igmm specifies the iterated GMM estimator. Whether you include time effects or not is independent of this choice.
7) There is nothing wrong with it.

Hamid muili
Please check whether you have the latest version of the command. If you type

Code:

which xtdpdgmm

the command version should be 2.6.2. If it is not, please update the command. After the update, it is recommended to restart Stata. I hope this solves the problem.

It is customary to report the test with the 2-step weighting matrix.

https://www.kripfganz.de/stata/
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#470

24 Aug 2022, 10:24

Dear Professor Sebastian,

Many thanks for your reply. I am very grateful to you for all your help. I still have the following questions, please! Sorry!

1) I will apply the Difference-GMM estimator. Please correct me if I am wrong. For time dummies, there is no need to create (generate) them manually as additional regressors and as instruments in an iv() option, that is because your command ‘xtdpdgmm’ with the option ‘teffects’ do all that. But for other dummies (such as industry and country dummies), I need to create (generate) them manually.

2) L.x1 is the independent variable of my regression model, this independent variable (L.x1) is endogenous. My regression model includes a dummy variable (fc, this dummy variable takes the value of 1 for the 3 years 2008, 2009, and 2010). Also, my regression model includes interaction between the endogenous variable (L.x1) with the dummy variable (fc). Thus, do I have to consider this interaction between the endogenous variable (L.x1) and the dummy variable (fc) as predetermined, exogenous, or endogenous?

3) Also, my regression model includes a dummy variable (mn for country dummies). Which is the correct code of the following in order to implement the Difference-GMM estimator, given that the regression model includes the dummy variable (fc), country dummies (mn), and industry dummies (ind)?

3.1) Do I have to put ‘diff’ in iv() option for dummies as follows?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
>iv(i.ind, diff) iv(i.fc, diff) iv(i.mn, diff) two vce(r)

3.2) Or do I have to put ‘diff model(level)’ in iv() option for dummies as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> iv(i.ind, diff model(level)) iv(i.fc, diff model(level)) iv(i.mn, diff model(level)) two vce(r)

3.3) Or do I have to put ‘model(level)’ in iv() option for dummies as follows?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) two vce(r)

3.4) Or do I have not to put anything in iv() option for dummies as follows?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> iv(i.ind) iv(i.fc) iv(i.mn) two vce(r)

If none of the previous codes is correct, what is the correct code that I have to use in order to implement the Difference GMM estimator, given that the regression model includes the dummy variable (fc), the country dummies (mn), and the industry dummies (ind)?

4) The dummy variable (fc) takes the value of 1 for the 3 years 2008, 2009, and 2010. Thus, my question is: Is it correct to include the dummy variable (fc) and the time dummies in the same regression model (in the same code) as follows?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.fc, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> iv(i.fc, diff) teffects two vce(r)

5) Please correct me if I am wrong. L.x7 is a control variable.

5.A) L.x7 is the lag 0 of the control variable L.x7.

5.B) L2.x7 is the first lag (lag 1) of the control variable L.x7.

6) To implement the Difference GMM estimator, is it required/necessary to specify the ‘overid’ option in the same line of your command ‘xtdpdgmm’? if so, what is the notion behind that, and what are the outcomes? Is it a substitute for ‘estat overid’? i.e., when the ‘overid’ option is specified in the ‘xtdpdgmm’ command line, does it mean that there is no need to apply ‘estat overid’?

7) When I apply the Difference GMM estimator, do I have to instrument all the variables included in the regression model? Or do I have to instrument only the endogenous and predetermined variables (i.e., no need to instrument the exogenous)?

8) According to your post #440 “Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:
Code:

xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)

Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.” Thus, my question is: can I replicate your suggestion for dynamic panel data (unbalanced)?

I am so sorry for the long message, but I do need your help, Professor.

Your help, patience and cooperation are highly appreciated.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2589
#471

24 Aug 2022, 10:38

1) Correct.

2) You would usually treat the interaction with an endogenous variable also as endogenous.

3) For the Difference-GMM estimator, you would normally choose specification 3.1.

4) If you have time dummies, the effect of the fc dummy is not separately identifiable due to perfect multicollinearity. You will notice that either this dummy or one of the time dummies (or the constant) will be omitted.

5) Correct.

6) The overid option has no effect on the estimation of the model. It allows for the computation of additional "difference-in-Hansen" test statistics with estat overid.

7) You always need to instrument the exogenous regressors (typically by themselves).

8) This suggestion carries over to dynamic panel models, yes.

https://www.kripfganz.de/stata/
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#472

24 Aug 2022, 13:49

Dear Professor Sebastian,

Thank you very much for your swift valuable response. I do not know how to thank you, Professor!

Can I have your permission to include your name in the acknowledgement section of my PhD thesis? Your help is much more than my supervisors’.

I still have the following questions, please!

1) How to decide whether I have to include the second lag of the dependent variable L2.y as a regressor in my regression model?

2) Do I have to consider the second lag of the dependent variable L2.y as predetermined or exogenous?

3) I have unbalance panel data. Thus, to implement the Difference GMM estimator using your command ‘xtdpdgmm’, do I have to (can I) include the option “orthogonal” in the code of your ‘xtdpdgmm’ command? If no, what to include instead of “orthogonal” to deal with unbalance panel data in order to implement the Difference GMM estimator using your command ‘xtdpdgmm’?

4) Regarding forward-orthogonal deviations (FOD). I have the following questions:

4.1) Is ‘FOD’ applicable for the Difference-GMM estimator? or applicable for the System-GMM estimator? or applicable for both? or is ‘FOD’ applied separately from the Difference-GMM and the System-GMM estimators?

4.2) Is ‘FOD’ a substitute for (alternative to) the Difference-GMM estimator?

4.3) Is ‘FOD’ a part of the System-GMM estimator?

4.4) What I know is that ‘FOD’ is better than the Difference-GMM estimator when the panel data is unbalanced. Am I right?

4.5) Can I use ‘FOD’ when I am applying the System-GMM estimator? If so, is there any specific option/expression that should be used in the level model (accompanied/compatible with ‘FOD’)?

4.6) Is the following code correct to use ‘FOD’ in order to implement the Difference-GMM estimator?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> iv(i.ind, diff) iv(i.fc,diff) iv(i.mn, diff) two vce(r)

Where:
y is the dependent variable;
L.y is the lagged dependent variable as a regressor (L.y is predetermined);
L.x1 is the independent variable (L.x1 is endogenous);
The control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 are predetermined.
The control variable x10 (firm age) is exogenous.
ind, mn are industry and country dummies, respectively.
cf is a dummy variable that takes the value of 1 for the 3 years 2008, 2009, and 2010.

4.6) Which is the correct code of the following to use ‘FOD’ in order to implement the System-GMM estimator?

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) two vce(r)

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, diff model(level)) iv(i.fc, diff model(level)) iv(i.mn, diff model(level)) two vce(r)

If none of the previous codes is correct, what is the correct code that I have to use in order to implement the System GMM estimator using (FOD), given that the regression model includes the dummy variable (fc), the country dummies (mn), and the industry dummies (ind)?

4.7) In some of your slides, you used ‘model(md)’ ‘model(mdev)’. Thus, my questions are: is ‘model(md)’ applicable for the Difference-GMM estimator, or for the System-GMM estimator, or for both?

How many times do I have to mention ‘model(md)’ in the code for the Difference-GMM estimator and for the System-GMM estimator?

Where to place ‘model(md)’ in the code for the Difference-GMM estimator and for the System-GMM estimator?

What to use with ‘model(md)’ in the differenced model? What to use with ‘model(md)’ in the level model?

To use ‘model(md)’, is it required to use ‘FOD’ i.e., is ‘model(md)’ accompanied with/conditional to using ‘FOD’? Or can I use ‘model(md)’ to implement the Difference-GMM estimator and the System-GMM estimator without using ‘FOD’?

5) To implement the Difference-GMM estimator, do I have to include the option ‘nl(noserial)’? if so, does it matter where to place it in the code?

6) Moreover, one of your slides shows the option ‘nl(noserial)’ with ‘collapse’ as follows: nl(noserial, collapse). Thus, is it wrong not to use ‘collapse’ with the option ‘nl(noserial)’?

7) Some of your slides show ‘quietly’ before your command ‘xtdpdgmm’. Does ‘quietly’ change anything? i.e., is it better to use or not to use ‘quietly’?

8) In addition to the options ‘teffects’ ‘two’ ‘vce(robust)’, are there any other specific options I have to include in the code to implement the Difference-GMM estimator using your command ‘xtdpdgmm’? if so, are there any conditions to include them in the code?

9) Is it necessary/important to have a specific order of the code’s components and options? i.e., does the order of the code’s components and options matter?

10) Please, correct me if I am wrong regarding the order/steps of applying the Difference GMM estimator.

10.1) I should first run the regression of the Difference GMM estimator. Second, I should apply ‘estat serial, ar(1/3)’ in order to test for serial correlation of residuals. Third, I should apply ‘estat overid’ that performs the Sargan-Hansen test of the overidentifying restrictions in order to test whether the instruments (used for the differenced model) are valid.

10.2) Also, do I have to change, amend, or add anything to those steps in order to implement the Difference GMM estimator using your command 'xtdpdgmm'?

10.3) To apply the Difference GMM estimator, there is no need to apply ‘estat overid, difference’ that performs the Sargan-Hansen difference test of the overidentifying restrictions in order to test whether the additional instruments employed in the System GMM (used for the level model) are valid. Am I right?

Your help, patience and cooperation are appreciated.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2589
#473

27 Aug 2022, 05:38

Please feel free to include my name in your acknowledgement section. Please also consider directly citing my xtdpdgmm package, e.g. as follows:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

1) Including the second lag of the dependent variable as a regressor can be useful when there are still signs of autocorrelation (e.g. second-order first-difference correlation with the Arellano-Bond test) after including only the first lag. Alternatively, just include the second lag and check the statistical significance of its coefficient.

2) Any lag of the dependent variable would be treated as predetermined. Effectively, you do not need to do anything about it because the instruments for the first lag already take care of the second lag as well.

3) You do not have to necessarily use the forward-orthogonal deviations with unbalanced panel data. It is just generally more efficient in this case and therefore recommended.

4.1) You would use the model in forward-orthogonal deviations instead of the model in first differences (not usually in combination). This can be for the "difference GMM" estimator (which effectively becomes a FOD estimator) and for the "system GMM" estimator.

4.2) Basically, yes.

4.3) As with the "difference part", you can have alternatively a "FOD part" of "system GMM".

4.4) Generally, yes.

4.5) You do not have the change the level model at all when you use the system GMM estimator with the FOD part.

4.6) I would probably not include the diff suboption for iv() when using model(fod), but there is nothing wrong about it. For strictly exogenous variables and for dummy variables, I would personally use model(mdev) instead of model(fod), but note that this is not yet standard practice.
For dummy variables, one would normally assume that they are uncorrelated with the unobserved group-specific effects. In this case, the first of your two system GMM specifications would be more appropriate.

4.7) As said just above, model(mdev) is appropriate for strictly exogenous variables or dummy variables. For an estimation without a level equation, I would recommend the following instruments:

Code:

gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(md)) iv(i.fc, model(md)) iv(i.mn, model(md))

For an estimation with a level equation, I would recommend the following instruments:

Code:

gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level))

Note that I have combined for the strictly exogenous regressor x10 instruments for model(md) and model(fod). The latter are the same as those for predetermined regressors. The former comes additionally for strictly exogenous regressors.

5) For the traditional "difference GMM" estimator, you should not include the nl(noserial) option. With this option, you can implement the Ahn-Schmidt GMM estimator with nonlinear moment conditions, which can be an alternative to the system GMM estimator.

6) There is nothing wrong about not using the collapse option with nl(noserial). The logic behind the collapse option in this context is the same as with other instruments, to avoid having too many moment conditions/instruments.

7) quietly just suppresses all the output. All postestimation commands still work normally.

8) You might want to include the small option for a small-sample degrees of freedom correction of the standard errors.

9) The options after the comma can be specified in any order. It does not matter.

10) This would be the traditional approach, yes. estat overid, difference could still be useful to check some other assumptions, e.g. the correct classification of variables as endogenous, predetermined, or exogenous. Please see the model selection section of my 2019 London Stata Conference presentation for an illustration.

https://www.kripfganz.de/stata/
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#474

27 Aug 2022, 09:47

Dear Professor Sebastian,

Many thanks for your valuable reply. Thank you for your permission to include your name in the acknowledgement section. Of course, I will cite your 'xtdpdgmm' package as you mentioned exactly.

Please, allow me to ask the following questions. Sorry!

I will apply the System GMM estimator using your command ‘xtdpdgmm’. I will consider the lagged dependent variable L.y as predetermined (as you kindly suggested), the independent variable L.x1 as endogenous, while the control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 as predetermined, and the control variable x10 (firm age) as exogenous. Thus, my questions are:

1) Which is the correct code of the following that I can use to implement the System GMM estimator?

1.1) In the first code, for the differenced model, I use lag(0 2) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 2) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 2))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.2) In the second code, for the differenced model, I use lag(0 2) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 0) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.3) In the third code, for the differenced model, I use lag(1 3) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 2) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 2))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.4) In the fourth code, for the differenced model, I use lag(1 3) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 0) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.5) In the fifth code, for the level model, I put ‘model(level)’ instead of ‘diff model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) model(level)) gmm(L.x1, lag(1 1) model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) model(level)) teffects two vce(r)

1.6) In the sixth code, for the level model, I put ‘diff’ instead of ‘diff model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff) gmm(L.x1, lag(1 1) diff) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff) teffects two vce(r)

Where:
y is the dependent variable;
L.y is the lagged dependent variable as a regressor (L.y is predetermined);
L.x1 is the independent variable (L.x1 is endogenous);
The control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 are predetermined.
The control variable x10 (firm age) is exogenous.

If none of the previous codes is correct, what is the correct code that I have to use in order to implement the System GMM estimator?

2) My regression model includes a dummy variable (fc, this dummy variable takes the value of 1 for the 3 years 2008, 2009, and 2010). Also, it includes industry dummies (ind) and country dummies (mn) to examine the industry effects and the country effects. Thus, my question is: to apply the System GMM estimator, which is the correct code of the following?

2.1) In the first code, I put ‘diff’ in iv() option for dummies as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, diff) iv(i.fc, diff) iv(i.mn, diff) two vce(r)

2.2) In the second code, I put ‘diff model(level)’ in iv() option for dummies as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, diff model(level)) iv(i.fc, diff model(level)) iv(i.mn, diff model(level)) two vce(r)

2.3) In the third code, I put ‘model(level)’ without ‘diff’ in iv() option for dummies as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) two vce(r)

If none of the previous codes is correct, what is the correct code that I have to use in order to implement the System GMM estimator, given that the regression model includes the dummy variable (fc), the country dummies (mn), and the industry dummies (ind)?

3) To implement the System GMM estimator using your command ‘xtdpdgmm’, I have the following questions:

3.A) Is it necessary/required to instrument the dummies (cf, year, industry, and country dummies) in the differenced model only, or in the level model only, or in both models (differenced and level)?

3.B) What is the option that should be accompanied with the dummies in the differenced model and in the level model? i.e., which option of the following ‘model (diff)’, ‘model(level)’, ‘diff’, ‘diff model(level)’ should be used for dummies in the differenced model? And which option should be used for the level model?

4) Do I have to use ‘iv’ or ‘gmm’ for the dummies (cf, year, industry, and country dummies)?

5) Is it necessary/required to mention ‘lag( )’ for the dummies (cf, year, industry, and country dummies)? If no, why?

6) The control variable x10 (firm age) is exogenous. Is it better to instrument it in the differenced model, or in the level model, or in both models? For the exogenous variable (firm age), do I have to use the same lag( ) for the differenced model and for the level model? {i.e., for the differenced model: gmm(x10, lag(0 0) model(diff) collapse). For the level model: gmm(x10, lag(0 0) diff model(level) collapse)}. Or should be the lag( ) used for the differenced model different from the lag( ) used for the level model? {i.e., for the differenced model: gmm(x10, lag(0 2) model(diff) collapse). For the level model: gmm(x10, lag(0 0) diff model(level) collapse)}.

7) When I apply the System GMM estimation, do I have to instrument all the variables included in the regression model for both models (differenced model and level model)? Or for the level model, do I have to instrument only the variables that I do not instrument for the differenced model, and vice versa?

8) Please, correct me if I am wrong regarding the order/steps of applying the System GMM estimation. I should first run the regression of the System GMM estimator using your command ‘xtdpdgmm’. Second, I should apply ‘estat serial’ in order to test for serial correlation of residuals. Third, I should apply ‘estat overid’ that performs the Sargan-Hansen test of the overidentifying restrictions in order to test whether the instruments (used for the differenced model) are valid. Fourth, I should apply ‘estat overid, difference’ that performs the Sargan-Hansen difference test of the overidentifying restrictions in order to test whether the additional instruments employed in the System GMM (used for the level model) are valid. Also, do I have to change, amend, or add anything to those steps in order to implement the System GMM estimator using your command ‘xtdpdgmm’?

I am very grateful to you for all your support and effort.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2589
#475

31 Aug 2022, 09:40

1.1 and 1.2) With model(diff), lag 0 is not a valid instrument for predetermined variables. The first admissible lag is 1. Please see the Remarks section in the xtdpdgmm help file.
1.3 and 1.4) Both codes are valid. I recommend 1.3.
1.5) This is only valid if those level instruments are uncorrelated with the unobserved group-specific effects, which effectively requires a random-effects assumption.
1.6) This does not produce any instruments for the level model because you set the default to model(diff).

2.1) iv(i.ind, diff) iv(i.fc, diff) iv(i.mn, diff) produces differenced instruments for model(diff), which was set as the default. This is valid but typically inefficient for dummy variables, which are usually assumed to be uncorrelated with the unobserved group-specific effects.
2.2) Again, this is valid but the diff option creates inefficiencies.
2.3) This is the recommended approach for dummies in system GMM. (Remember to adjust the lag order for predetermined variables in this code.)

3.A) It is generally recommended to instrument the dummies only in either the first-differenced or the level model to avoid redundancies. For system GMM, the level model is typically recommended.
3.B) For the differenced model, you would typically combine diff model(diff). For the level model, you would just use model(level). For dummy variables in system GMM, the latter is recommended.

4) For dummies, you would typically use iv().

5) When you use the iv() option for dummies, you do not need to use the lag() suboption because the default is already lag(0 0). Further lags are redundant for dummies.

6) You would normally treat the exogenous variable in the same way as a predetermined or endogenous variable, but just adjust the lag order appropriately; see you code 1.3. As an exception, when the exogenous variable is also assumed to be uncorrelated with the unobserved group-specific effects, you do not need to include the diff option with model(level).

7) You normally instrument all variables in the differenced model (possibly excluding dummy variables). If your variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity), then you additionally instrument them in the level model.

8) I would recommend to start with a "difference GMM" estimation and to run estat overid for this estimation first. If it rejects the difference GMM estimation, then there is no point going on to the system GMM estimation.

https://www.kripfganz.de/stata/
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#476

31 Aug 2022, 11:14

Dear Professor Sebastian,

Many thanks for your valuable response. Indeed, saying "thank you very much" is not enough. I am very grateful to you for all your support and effort.

I still have the following questions which might be the set of questions before the final one, please!

1) Are the following equivalent or different? Which is the one for the Difference GMM estimator (i.e., to be used for the differenced model)? Which one is to be used for the level model and the System GMM estimator?

1.1) iv(x, lag( ) diff model(level))

1.2) iv(x, lag( ) diff)

1.3) iv(x, lag( ) model(diff) diff)

1.4) iv(x, lag( ) model(level))

1.5) iv(x, lag( ) model(diff))

2) Can I use your command ‘xtdpdgmm’ to implement the Difference GMM estimator and the System GMM estimator for a static model? If so, is the only thing I should do (to implement the Difference GMM estimator and the System GMM estimator for the static model) just not to include the lagged dependent variable in the regression model, and use the same codes I used when implementing the Difference GMM estimator and the System GMM estimator for the dynamic model?

3) Regarding post #471 point 6) “The overid option has no effect on the estimation of the model. It allows for the computation of additional "difference-in-Hansen" test statistics with estat overid.”

Thus, I have the following questions:

3.A) Does it mean that the difference-in-Hansen test (the Sargan-Hansen difference test of the overidentifying restrictions) cannot be performed without specifying the option ‘overid’ in the ‘xtdpdgmm’ command line?

3.B) To perform the difference-in-Hansen test (the Sargan-Hansen difference test of the overidentifying restrictions), should ‘estat overid’ be used? Or should ‘estat overid, difference’ be used?

3.C) What I know is that ‘estat overid’ performs the Sargan-Hansen test of the overidentifying restrictions, whereas ‘estat overid, difference’ perform the difference-in-Hansen test (the incremental Hansen test/the Sargan-Hansen difference test of the overidentifying restrictions). Am I right?

4) When applying the Difference GMM estimator and the System GMM estimator, do I have to apply the 'under-identification test’? if so, at which step i.e., after/before ‘running xtdpdgmm regression’; ‘estat serial, ar(1/3)’; ‘estat overid’; ‘estat overid, difference’?

5) I have unbalanced panel data. To implement the Difference GMM estimator, can I include the option ‘orthogonal’ in the code of your ‘xtdpdgmm’ command? If no, what to include instead of ‘orthogonal’ to deal with unbalanced panel data in order to implement the Difference GMM estimator using your command ‘xtdpdgmm’? Also, can I include the option ‘orthogonal’ in the code of your ‘xtdpdgmm’ command to implement the System GMM estimator?

6) To use ‘model(md)’ ‘model(mdev)’, is it required to apply the ‘FOD’ estimator? i.e., is ‘model(md)’ accompanied with/conditional to applying the ‘FOD’ estimator? Can I use ‘model(md)’ without using the ‘FOD’ estimator? Can I use ‘model(md)’ when applying the Difference GMM estimator ‘model(diff)’? Can I use ‘model(md)’ when applying the System GMM estimator?

7) How could ‘estat overid, difference’ check the correct classification of variables as endogenous, predetermined, or exogenous? I kindly ask you please to give me an example. I read your PDF file presentation, but I feel that I am not able to understand it fully.

8) How to decide/determine whether a variable is either strictly exogenous, endogenous, or predetermined? Is there a test to classify whether a variable is either strictly exogenous, endogenous, or predetermined?

9) Regarding post #473 point 5) “For the traditional "difference GMM" estimator, you should not include the nl(noserial) option. With this option, you can implement the Ahn-Schmidt GMM estimator with nonlinear moment conditions, which can be an alternative to the system GMM estimator.”. Thus, I have the following questions:

9.1) What do you mean by the traditional "difference GMM" estimator?

9.2) What is the Difference GMM estimator that the ‘nl(noserial)’ option can be included in? i.e., for which Difference GMM estimator can the ‘nl(noserial)’ option be included?

9.3) Can the ‘nl(noserial)’ option be included in the System GMM estimator?

On page 58 of your PDF file presentation “These nonlinear moment conditions are redundant when added to the sys-GMM moment conditions (Blundell and Bond, 1998) but improve efficiency when added to the diff-GMM moment conditions. Furthermore, they may provide identification when the diff-GMM estimator does not”

10) Are the iterated GMM estimator and the Ahn-Schmidt GMM estimator different from the Difference GMM estimator and the System GMM estimator? If so, in which aspects? and which one is better?

Your patience, cooperation, and support are highly appreciated.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2589
#477

02 Sep 2022, 03:40

1) 1.1 and 1.2 are only equivalent, if you have not specified model(diff) as a separate option in your command line, because the latter would override the model(level) default. In contrast, 1.2 and 1.3 are only equivalent, if you have changed the default by specifying model(diff) as a separate option. All other specifications differ from each other. For a difference GMM estimator, one would typically use 1.3 to instrument dummy variables.

2) Yes, you would just remove the lagged dependent variable and the instruments for the lagged dependent variable.

3) No, you can compute a difference-in-Hansen test by separately estimating the two models which you want to compare, e.g.

Code:

. webuse abdata . xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(1 4)) m(d) c two vce(r) . estimates store ab . xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(1 4) m(d)) iv(L.n w k, d) two c vce(r) . estat overid ab

While the test result differs numerically from the one obtained without separate estimations, the two versions are asymptotically equivalent:

Code:

. xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(1 4) m(d)) iv(L.n w k, d) two c vce(r) overid . estat overid, difference

4) You would normally test for underidentification before you run the overidentification tests.

5) The xtdpdgmm command does not have an orthogonal option; the command xtdpdgmmfe does. With xtdpdgmm, you would specify model(fod) instead of model(diff), but you also need to adjust the lag order for the instruments, e.g. gmm(x, lag(1 3) model(diff)) would become gmm(x, lag(0 2) model(fod)). For a system GMM estimator, you would leave the model(level) instruments unchanged.

6) model(md) can also be used for strictly exogenous variables in combination with model(diff) for other variables. This works both for a difference GMM and a system GMM estimator. There is full flexibility.

7) On slide 108 of my 2019 London Stata Conference presentation, the instruments gmm(w, lag(1 .)) are valid for an endogenous variable with model(fod). I then add gmm(w, lag(0 0)), which adds the extra instrument valid for a predetermined variable. The difference-in-Hansen test on the next slide (line 7 of the results table) then checks the validity of that additional instrument. Here, the p-value of 0.3 appears sufficiently high to maintain the assumption that this variable is predetermined. You can then proceed in the next step by adding gmm(w, lag(0 0) model(md)), which would be valid under strict exogeneity.

8) You would just use the procedure described in 7 to do this test.

9) By "traditional" difference GMM estimator, I mean the one proposed by Arellano and Bond (1991) with linear moment conditions only. With nonlinear moment conditions, people would not call it the "difference GMM" estimator anymore. It is just the Ahn-Schmidt estimator. In practice, nl(noserial) can be added almost without cost because the underlying assumptions are basically the same as with the Arellano-Bond estimator. Thus, the added efficiency from the nonlinear moment conditions comes for free. nl(noserial) is not typically used with system GMM because the nonlinear moment conditions become redundant once the extra instruments for model(level) are added.

10) The Ahn-Schmidt estimator differs from the Arellano-Bond difference GMM estimator and the Blundell-Bond system GMM estimator as just described. Iterated GMM refers to the estimation technique. You can have an iterated difference GMM or an iterated system GMM estimator as an alternative to corresponding one-step or two-step GMM estimators.

https://www.kripfganz.de/stata/
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#478

02 Sep 2022, 06:44

Dear Professor Sebastian,

Thank you very much for your valuable reply. I do not know how to thank you, professor! I am very grateful to you for all your support and effort.

I still have the following questions, please! Hopefully, it will be the final questions set.

1) What if the coefficient becomes statistically insignificant when applying GMM even though the coefficient was significant when applying ‘xtreg’. Is there any reason for this issue?

2) When the option “teffects” is included in the code of the 'xtdpdgmm' command, the first three years do not appear in the findings. Is it normal? Is it due to including the second lag of the dependent variable as a regressor?

3) I have 8 industries. The findings show that the last industry is omitted because of collinearity. Is it normal?

4) Regarding the ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table, what if nothing in terms of the industry dummies appears there?

5) The independent variable of my regression model is L.x1 (L.x1 is endogenous). Also, my regression model includes L2.x1. Thus, for the (FOD) estimator, the instruments for L.x1 should start from the first lag of L.x1 i.e., the first instrument for L.x1 is L2.x1. Thus, my question is: Is it right to use L2.x1 as an instrument for L.x1, given that L2.x1 is already included in the regression model as a regressor?

6) When the option ‘nl(noserial)’ is included in the code, it takes a long time to perform the regression. Is it normal? Also, what if the message (not concave) appears for Iteration?

7) Regarding post #471 point 8) “This suggestion carries over to dynamic panel models, yes.”

Thus, my question is: What does the following code apply? i.e., which estimator does this code perform? Does it apply the Differenced GMM estimator, the System GMM estimator, or something else?

“Code:
xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)”

8) To apply your command ‘xtdpdgmmfe’, do I have to just type ‘xtdpdgmmfe’ instead of ‘xtdpdgmm’ and keep everything else the same in the code of ‘xtdpdgmm’? i.e., to apply your command ‘xtdpdgmmfe’, is the only thing I need to work on in the code of your ‘xtdpdgmm’ command to replace ‘xtdpdgmm’ with your command ‘xtdpdgmmfe’? if no, I kindly ask you please what I have to do.

9) Is your command ‘xtdpdgmmfe’ applicable in a static panel model and in a dynamic panel model with an endogenous variable?

10) Is the Chudik-Pesaran estimator applicable in a panel model with an endogenous variable? If no, does ‘xtdpdgmmfe’ command solve this issue?

11) Is the Hayakawa, Qi, and Breitung estimator applicable in a panel model with an endogenous variable?

Your patience, cooperation, and effort are highly appreciated.

Last edited by Zainab Mariam; 02 Sep 2022, 07:04.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2589
#479

02 Sep 2022, 07:49

1) There are 2 main reasons why this can happen:
a) xtreg uses stronger assumptions which are violated, e.g. because some regressors are predetermined or endogenous. As a result, coefficient estimates could be closter to 0 with GMM and eventually insignificant.
b) Standard errors are generally much larger with GMM due to the possible weakness of the instruments. This raises the chance of getting statistically insignificant results.

2) Yes, and yes.

3) Yes, a full set of dummies generally leads to perfect collinearity.

4) If industry dummies are specified as instruments but they do not appear in that list, then the respective instruments are likely omitted due to some perfect collinearity. For difference GMM, this might be the case if those industry dummies are time-invariant.

5) Yes, the inclusion of L2.x1 as a regressor does not invalidate it as an instrument.

6) Yes, with nonlinear moment conditions a numerical optimization procedure is required, which can take much longer, especially when the data set is relatively large. The message "not concave" can be ignored if it only appears for intermediate iterations. If it appears for the final iteration, then there are numerical difficulties and the algorithm did not converge to a proper solution. In such a case, a simplification of the model is required, which usually involves fewer instruments or even the abandonment of nonlinear moment conditions.

7) This is similar in spirit to a difference GMM estimator, where X2 X3 are strictly exogenous and X1 is endogenous. People probably would not call it a "difference GMM" estimator because it does not exclusively use model(diff). Some people may even call it a "system GMM" estimator because it is based on a system of two models, model(mdev) and model(diff), but this could lead to confusion with the traditional system GMM estimator, which used model(diff) (or model(fod)) and model(level). There is no commonly accepted name for this kind of estimator.

8) No, xtdpdgmmfe uses a different syntax which some users might find easier. It then translates this syntax into the syntax required for xtdpdgmm. The computations are still performed with the latter command. Please see the help file for xtdpdgmmfe and my earlier post #450 in this Statalist topic. If needed, you can subsequently modify the xtdpdgmm command line displayed by xtdpdgmmfe.

9) Yes.

10) No, the Chudik-Pesaran estimator requires all variables to be either strictly exogenous or predetermined. xtdpdgmmfe "solves" this issue by switching to a specific version of a difference GMM estimator when endogenous variables are present.

11) Yes.

https://www.kripfganz.de/stata/
Comment
Zainab Mariam

Join Date: Jul 2022

Posts: 51
#480

10 Sep 2022, 08:29

Dear Professor Sebastian,

Thank you very much for your swift valuable reply. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

1) Regarding post #479 point 4) “If industry dummies are specified as instruments but they do not appear in that list, then the respective instruments are likely omitted due to some perfect collinearity. For difference GMM, this might be the case if those industry dummies are time-invariant.”.

I have 8 industries. Thus, I generated 8 industry dummies using ‘tabulate var, gen(Industry)’. On page 86 of your PDF file presentation, the list ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table shows industry dummies instruments as follows “2bn.ind 3.ind 4.ind 5.ind 6.ind 7.ind 8.ind 9.ind”. Thus, I have the following questions:

1.A) Which code of the following can I apply in order for the list ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table to show industry dummies instruments similar to yours on page 86 of your PDF file presentation?

In the following four codes, I apply the (FOD) estimator. Where:

y is the dependent variable;
L.y is the lagged dependent variable as a regressor (L.y is predetermined);
L.x1 is the independent variable (L.x1 is endogenous);
The control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 are predetermined.
The control variable x10 (firm age) is exogenous.

1.1) In the first code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘model(md)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(md)) teffects small two vce(r)

1.2) In the second code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(level)) teffects small two vce(r)

1.3) In the third code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘diff model(diff)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, diff model(diff)) teffects small two vce(r)

1.4) In the fourth code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘diff’ and I put in the iv( ) option all the industries (8 industries) with ‘model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, diff) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(level)) teffects small two vce(r)

If none of the above codes can show industry dummies instruments in the list ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table similar to yours on page 86 of your PDF file presentation, I kindly ask you please for the code I should use to show industry dummies instruments in the list.

1.B) Is it correct to put in the regression code all the industry dummies (8 industries) included in the regression model as regressors? or do I have to exclude manually one of them?

1.C) To instrument the industry dummies, is it correct to put in the iv( ) option all the industries (8 industries) included as regressors in the regression model? i.e., do I have to instrument all the industry dummies included in the regression model?

1.D) I include the options: ‘teffects’ ‘small’ ‘two’ ‘vce(r)’. Is it better/required to add the option ‘nolevel’?

2) Regarding post #479 point 10) “No, the Chudik-Pesaran estimator requires all variables to be either strictly exogenous or predetermined. xtdpdgmmfe "solves" this issue by switching to a specific version of a difference GMM estimator when endogenous variables are present.”

Thus, my questions are:

2.1) Can your command ‘xtdpdgmmfe’ perform by itself the switching to a specific version of a difference GMM estimator when endogenous variables are present, or do I have to perform the switching manually? If so, how?

2.2) Can the previous command ‘xtdpdgmm’ perform the Chudik-Pesaran estimator when the regression model includes endogenous variables? If so, how?

3) To apply the Hayakawa, Qi, and Breitung estimator in a panel model with an endogenous variable, can the command ‘xtdpdgmm’, the command ‘xtdpdgmmfe’, or both commands perform the Hayakawa, Qi, and Breitung estimator in a panel model with an endogenous variable?

4) When using your command ‘xtdpdgmmfe’, can the regression model include three lags of each regressor?

5) When using your command ‘xtdpdgmmfe’, is it required to follow the ‘Sequential model selection process’?

6) Is it better/required to add the option ‘dc’ (doubly-corrected robust standard errors) when implementing the different estimators (such as the difference-GMM estimator and the system-GMM estimator, …)?

7) Is there any difference between gmmiv(var, lagrange(1 .)) and gmm(var, lag(1 .))?

8) As your commands apply the fixed effects estimator (FE), can I use your commands to apply the Instrumental Variable Tobit (IVTobit) method for panel data? (y is a limited dependent variable).

I do appreciate your patience, support and effort.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment