XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz replied

02 Dec 2022, 02:28
Comparing those dummy coefficients could possibly be a reasonable strategy. Certainly, those dummy variables have to be treated as endogenous variables because they are effectively functions of the dependent variable. But you can possibly still use lags of those dummies as instruments, assuming that positive/negative rating changes are autocorrelated over time.

Last edited by Sebastian Kripfganz; 02 Dec 2022, 02:33.
Leave a comment:
Joseph L. Staats replied

01 Dec 2022, 14:30
Sebastian,

Thanks to your prior help, I am using xtdpdgmm with success for a project I am working on that involves as a dependent variable ratings given to government bonds in various countries. As a small part of this project, I want to show that the raters are more severe when downgrading a bond's rating than they are generous when upgrading a bond's rating, all else being equal. To test this, I have created two dummy variables. One dummy variable is zero (0) if, in a given year, there is no change in rating or there is an upgrade in rating and one (1) if there is a downgrade. The other dummy variable is zero(0) if there is no change in rating or there is a downgrade in rating and one (1) if there is an upgrade. I include a number of control variables that are standard for studying bond ratings. After running system GMM with both dummy variables included, I compare the coefficients for each of the dummy variables to see if they are statistically different in the direction just stated. In your mind, is this method proper? One of my concerns is whether I am using dependent variable factors on both sides of the equation: (1) bond rating as the formal dependent variable; and (2) whether a bond rating goes up or down on the independent variable side of the equation. I would appreciate your thoughts on this.
Leave a comment:
Zainab Mariam replied

27 Sep 2022, 16:00
Dear Professor Sebastian,

Many thanks for your beneficial response. I do not know how to thank you, Professor! Indeed, saying "thank you very much" is not enough. Your cooperation and support are priceless. You are an invaluable source of information.

I am very grateful to you for all your support and effort.

I have no further questions. Wish you the best of luck and success in your teaching.

Your patience, support and effort are highly appreciated, Professor!
Leave a comment:
Sebastian Kripfganz replied

27 Sep 2022, 04:44
1) You can normally do this, yes.

2) The coefficients are always interpreted for the level model, no matter whether you use the difference GMM or the system GMM estimator.

3.1) It is difficult to separately interpret those short-run adjustment coefficients if there are multiple lags. The first lag's coefficient measures the strength of the adjustment to a shock in the previous period, all else equal. For the second lag, you cannot simply extend this argument because the response to a shock 2 periods before also depends on the cumulative one-period responses. It is an additional delayed impulse on top of the first-order response. For the long-run adjustment, the sum of the two coefficients tells you how quickly the process reverts back to its equilibrium. If the coefficients sum up to 1, the shocks have a permanent effect. If they sum up to 0, the equilibrium is restored instantly because the initial response would be fully counteracted immediately afterwards.

3.2) This coefficient has the standard interpretation. Depending on whether your variables are measured in logs, this could be (semi-)elasticities. It is a "short-run" effect, i.e. telling you the instant response of the dependent variable to a change in that independent variable.

3.3) That would be a delayed effect after accounting for the instantaneous effect in 3.2.

4) It is not required for them to have opposite signs. For stability of the dynamic system, they should normally sum up to a value between 0 and 1. Opposite signs indicate that the initial response "overshoots" and is corrected by the delayed response.

5) There is no requirement here, not even on the sum of those coefficients. Again, opposite signs can indicate an "overshooting" effect.

6) No. The coefficients in the model are interpreted independently of the chosen estimator.

7) Dots mean that the respective test cannot be computed due to insufficient degrees of freedom. In the first table, you would look at the last row labelled "model(level)". First, you check the "Excluding" column, which is a Hansen test for the model excluding those level instruments; i.e. it effectively is a Hansen test for a difference GMM estimator. If this test passes with a sufficiently high p-value, then you move on to the "Difference" column. The latter is the actual Difference-in-Hansen test, which compares the system GMM estimator to the difference GMM estimator (which is why we need to check the Excluding test first; otherwise this would not be a valid comparison). Here, we would reject the validity of the level instruments because the p-value is too small. With the other tables, you would proceed similarly. In tables 2 and 4, you would check the 5th and 8th row, respectively, because those are the only instruments for the level model. In table 3, you would check again the last row because you are interested in testing all level instruments jointly.

Please note that I will be unlikely to respond to further questions over the next weeks due to heavy teaching loads.
Leave a comment:

Zainab Mariam replied

22 Sep 2022, 16:38

Dear Professor Sebastian,

Many thanks for your useful reply. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

1) According to posts #373 and #473 point 2) “The lagged dependent variable L.Y should normally be treated as predetermined (equivalently, the dependent variable Y itself is endogenous).” “Any lag of the dependent variable would be treated as predetermined.”

Thus, my question is: can I classify the first and second lags of L.x1 as predetermined, given that L.x1 is endogenous (L.x1 is the independent variable of my research)?

2) When using your command ‘xtdpdgmm’ to implement the System GMM estimator, do the corresponding findings show the coefficients of the differenced/transformed variables (variables at differences i.e., ∆) or the coefficients of the variables at level?

3) Regarding using your command ‘xtdpdgmm’ to implement the Difference GMM estimator, I have the following questions:

3.1) How to interpret the coefficients of the first lag and the second lag of the dependent variable y (i.e., how to interpret the coefficients of L.y and L2.y)? e.g., the coefficients of the first lag and the second lag of the dependent variable y are 0.5 and 0.02 for L.y and L2.y, respectively.

3.2) L.x1 is the independent variable of my regression model. Thus, how to interpret the coefficient of the independent variable L.x1? e.g., the coefficient of the independent variable L.x1 = 0.001

3.3) Also, my regression model includes the first lag of the independent variable L.x1. Thus, how to interpret the coefficient of L2.x1 (where L2.x1 is the first lag of the independent variable L.x1)? e.g., the coefficient of L2.x1 = -0.0009

4) Is it required for the coefficients of the first and second lags of the dependent variable y (L.y and L2.y) to have opposite signs? if so, why? and what if the coefficients of L.y and L2.y have the same sign?

5) L.x1 is the independent variable of my regression model. Also, my regression model includes the first lag of the independent variable L.x1. Thus, my question is: is it required for the coefficients of L.x1 and L2.x1 to have opposite signs? if so, why? and what if their coefficients (i.e., the coefficients of L.x1 and L2.x1) have the same sign?

6) Is the interpretation of the coefficients obtained by the System GMM estimator different from the interpretation of the coefficients obtained by the Difference GMM estimator?

7) Regarding post #483 point 13), I kindly ask you please to explain how the findings of a difference-in-Hansen test check whether my variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity). Suppose we have the following outcomes of the difference-in-Hansen test.

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(diff)	14.6666	6	0.0230	1.5296	3	0.6754
2, model(diff)	4.0234	3	0.2590	12.1728	6	0.0582
3, model(level)	15.8404	8	0.0447	0.3558	1	0.5509
4, model(level)	12.0861	7	0.0978	4.1102	2	0.1281
model(diff)	0.0000	0	.	16.1962	9	0.0629
model(level)	8.0920	6	0.2314	8.1042	3	0.0439

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(fodev)	8.9323	6	0.1774	3.7500	7	0.8081
2, model(fodev)	9.8897	6	0.1294	2.7926	7	0.9035
3, model(fodev)	9.2784	6	0.1585	3.4039	7	0.8453
4, model(fodev)	6.2261	6	0.3983	6.4561	7	0.4876
5, model(level)	9.6163	8	0.2930	3.0659	5	0.6898
model(fodev)	.	-15	.	.	.	.

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(fodev)	30.5644	30	0.4370	1.0296	7	0.9943
2, model(fodev)	25.8607	29	0.6329	5.7333	8	0.6771
3, model(fodev)	26.6376	29	0.5913	4.9564	8	0.7622
4, model(fodev)	27.3258	30	0.6061	4.2682	7	0.7484
5, model(fodev)	25.8421	29	0.6339	5.7518	8	0.6750
6, model(fodev)	27.0201	29	0.5706	4.5739	8	0.8020
7, model(mdev)	31.5847	36	0.6786	0.0093	1	0.9233
8, model(level)	31.3841	35	0.6434	0.2099	2	0.9004
9, model(level)	28.2006	32	0.6594	3.3934	5	0.6396
model(fodev)	.	-9	.	.	.	.
model(level)	28.1268	30	0.5637	3.4672	7	0.8387

	Excluding			Difference
Moment conditions	chi2	df	p	chi2	df	p
1, model(fodev)	25.4072	29	0.6570	2.3428	7	0.9385
2, model(fodev)	23.1059	28	0.7277	4.6440	8	0.7949
3, model(fodev)	22.3165	28	0.7664	5.4334	8	0.7104
4, model(fodev)	26.3066	29	0.6091	1.4433	7	0.9842
5, model(fodev)	23.2937	28	0.7182	4.4563	8	0.8138
6, model(fodev)	22.9352	28	0.7363	4.8147	8	0.7772
7, model(mdev)	27.4318	35	0.8154	0.3181	1	0.5727
8, model(level)	25.3010	31	0.7541	2.4489	5	0.7842
nl(noserial)	27.1247	35	0.8268	0.6253	1	0.4291
model(fodev)	.	-10	.	.	.	.

Also, what do the dots ‘.’ in the difference-in-Hansen test’s findings refer to?

Your patience, support and effort are highly appreciated.

Leave a comment:

Sebastian Kripfganz replied

20 Sep 2022, 04:18
1) Yes, you can use xtdpdgmmfe for such a model. You only need to specify each variable once as exogenous/predetermined/endogenous, not separately for each lag. Make sure you allow for sufficient lags used as instrument, i.e. a minimum of 3 if you are using lags 0 to 2 as regressors.

Regarding dummy variables, you would need to specify them as exogenous variables. I am afraid this may not deliver the desired specification, especially if those dummies are time-invariant. I have to think about adding another option to xtdpdgmmfe, but this will not happen very soon.

2) See 1.

3) An "unconventional" diff-GMM estimator might be one where all instruments for exogenous/predetermined/endogenous variables refer to the first-differenced model but you still want the time dummies (or other dummies) to be instrumented in the level model.

4) You can use it, but only for estimating a linear probability model.

5) Time-invariant variables are typically classified as exogenous with respect to the idiosyncratic error term. However, to identify their coefficients you would really need that they (or appropriate instruments for them) are exogenous with respect to the firm-specific effects. This is not what the exogenous() option of xtdpdgmmfe is doing. Currently, you would need to use xtdpdgmm and specify iv(time-invariant instrument, model(level)).

6) One could do this, yes. Yet, one would probably specify those dummies in the same way as if they were time-invariant, unless you only want them instrumented in the first-differenced model.

7) I cannot answer that as it is application specific. There could possibly be reasons to treat it as endogenous, at least with respect to the firm-specific effect.

8) You can check with the xtsum command if your dummies have zero or nonzero within variation.

9) The interpretation of the coefficients is always for the untransformed/level variables.

10.1) You do not need to use anything in addition to model(fod). You could use bodev as in Hayakawa, Qi, Breitung (2019), but personally I am not a fan of it, as you would lose an additional observation for each firm.

10.2) I am not entirely sure anymore what I meant with that sentence. For a strictly exogenous variable I would use the combination of the following two options: gmm(x, lag(0 .) model(fodev)) gmm(x, lag(0 0) model(mdev)), possibly restricting the maximum lag in the first option. For time-varying dummy variables, I would simply use iv(x, model(mdev)). That's basically the same as in your quoted example 11.

11.1) and 11.2) look alright. 11.3) only works if all regressors are exogenous with respect to the firm-specific effects, which is typically not reasonable to assume.

12) Please see help fvvarlist.

13) You would conduct a difference-in-Hansen test between the system GMM and a difference GMM estimator (where the latter just leaves out all of the instruments for the level model). If you use the overid option of xtdpdgmm, you could find this test in estat overid, difference in the row labelled model(level).
Leave a comment:
Zainab Mariam replied

19 Sep 2022, 09:50
Dear Professor Sebastian,

Thank you very much for your valuable response. Your cooperation and support are priceless.

1) Regarding my question 4 post #480 “When using your command ‘xtdpdgmmfe’, can the regression model include three lags of each regressor?”.
I did not mean three lags as instruments. I asked if I can include three lags of any variable as regressors i.e., the three lags are regressors. For instance, suppose that my regression model (the right-hand side) includes the following regressors: L(0/1).L.y; L(0/2).L.x1; L(0/2).L.x2; L(0/2).L.x3; L(0/2).L.x4; L(0/2).L.x5; L(0/2).L.x6; L(0/2).L.x7; L(0/2).L.x8; L(0/2).L.x9; x10. Thus, my question is: can I use your command ‘xtdpdgmmfe’ to run such a regression model? If so, do I have to classify each lag as exogenous, predetermined, or endogenous? e.g., for L(0/2).L.x1, do I have to specify L.x1, L2.x1, and L3x1 and classify each of them as exogenous, predetermined, or endogenous when using your command ‘xtdpdgmmfe’?

Also, when using your command ‘xtdpdgmmfe’, do I have to classify the dummies?

2) What should I classify the lag of an endogenous variable? Can I classify the lag of the endogenous variable as predetermined? For instance, L.x1 is the independent variable of my regression model (L.x1 is endogenous). My regression model includes also the first and second lags of L.x1 as regressors. Thus, what should I classify the first and second lags of L.x1, given that L.x1 is endogenous?

3) Regarding post #481 point 1.D) “Option nolevel is recommended if you do not have any instruments for the level model and you want a conventional difference/FOD estimator.”.
Thus, I kindly ask you please to give an example on how to use the option ‘nolevel’ for an unconventional difference/FOD estimator.

4) Regarding post #481 point 8) “These commands do not support nonlinear models for limited dependent variables, only the linear probability model.”.
Do you mean that I cannot use your commands in my research as the dependent variable y is limited and its values lie between 0 and 1? If so, I kindly ask you please for your advice.

5) Can every time-invariant variable be classified as an exogenous variable?

6) If dummies vary over time i.e., they are time-variant. Thus, can I classify them as exogenous? If no, what should the time-variant dummies be classified?

7) Is it normal to classify ‘firm age’ as exogenous?

8) How to decide whether dummies are time-invariant or time-variant?

9) When using your command 'xtdpdgmm' to implement the Difference GMM estimator, do the corresponding findings obtain the coefficients of the differenced variables (variables at differences i.e., ∆) or the coefficients of the variables at level?

10) Regarding post #473 point 4.6) “I would probably not include the diff suboption for iv() when using model(fod), but there is nothing wrong about it. For strictly exogenous variables and for dummy variables, I would personally use model(mdev) instead of model(fod), but note that this is not yet standard practice.”

Thus, my questions are:

10.1) As you would probably not include the diff suboption for iv() when using model(fod), thus, what to include instead?

10.2) Sorry! I did not get what you mean by “For strictly exogenous variables and for dummy variables, I would personally use model(mdev) instead of model(fod)”. Would you please give an example (the entire code) on how to use model(mdev) instead of model(fod)?

11) Regarding post #473 point 4.7) “… model(mdev) is appropriate for strictly exogenous variables or dummy variables. For an estimation without a level equation, I would recommend the following instruments:
Code:
gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(md)) iv(i.fc, model(md)) iv(i.mn, model(md))

For an estimation with a level equation, I would recommend the following instruments:
Code:
gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level))…”

Thus, given that the variable x10 'firm age' is exogenous, my questions are:

11.1) For your code when the estimation is without a level equation, what should the entire code of the ‘xtdpdgmm’ command include also? i.e., is it correct if I type x10 as a regressor {before specifying model(fod) in the code} and then to instrument x10, I type gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) as follows:

xtdpdgmm L(0/1).y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(md)) iv(i.fc, model(md)) iv(i.mn, model(md)) two vce(r)

11.2) For your code when the estimation is with a level equation, what should the entire code of the ‘xtdpdgmm’ command include also? i.e., is it correct if I type x10 as a regressor {before specifying model(fod) in the code} and then to instrument x10, I type gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) and I also type gmm(x10, diff model(level) lag(0 0)) as follows:

xtdpdgmm L(0/1).y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, diff model(level) lag(0 0)) two vce(r)

11.3) Also, regarding your code when the estimation is with a level equation, what should the entire code of the ‘xtdpdgmm’ command include also? i.e., is it correct if I type x10 as a regressor {before specifying model(fod) in the code} and then to instrument x10, I type gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) and I also type gmm(x10, model(level) lag(0 0)) as follows:

xtdpdgmm L(0/1).y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, model(md) lag(0 0)) gmm(x10, model(fod) lag(0 2)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, model(level) lag(0 0)) two vce(r)

12) For dummies, is it required to type ‘i.’ before the industry (ind), year (fc), and country (mn) dummies? If so, why?

13) Regarding post #475 point 7) “You normally instrument all variables in the differenced model (possibly excluding dummy variables). If your variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity), then you additionally instrument them in the level model.”

Thus, my question is: How to check whether my variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity)?

Sorry for the long message, professor!

Your patience, support and effort are highly appreciated.
Leave a comment:
Sebastian Kripfganz replied

13 Sep 2022, 09:57
1.A) If the industry dummies are time-invariant, then only code 1.2 would be appropriate. 1 of the industry dummies should normally be omitted due to the "dummy trap", i.e. all 8 industry dummies are perfectly collinear with the intercept. If more dummies are omitted (either from the regressor list or the instrument list), then this indicates that there might be other multicollinearity problems as well, which I cannot tell from the available information.
If the industry dummies vary over time, then 1.1 and 1.3 would also be appropriate codes. A similar qualification as before applies: At least one dummy will be omitted due to perfect collinearity.

1.B) As I said in 1.A, at least one dummy will be omitted. Stata will automatically omit one dummy at random. If you want to omit a specific dummy, which shall serve as a reference industry, then you need to omit it manually.

1.C) You can include all industry dummies as instruments. Stata will automatically omit at least one due to perfect collinearity. It does not matter which dummy is omitted in the list of instruments.

1.D) Option nolevel is recommended if you do not have any instruments for the level model and you want a conventional difference/FOD estimator.

2.1) xtdpdgmmfe automatically selects the appropriate instruments / moment conditions (and therefore the relevant estimator) corresponding to the chosen assumptions.

2.2) xtdpdgmm can estimate a model with the Chudik-Pesaran nonlinear moment conditions even if some regressors are treated as endogenous. However, the resulting estimator would be inconsistent. xtdpdgmm does not check whether you have chosen the options in a consistent way. Therefore, xtdpdgmmfe is less prone to such errors.

3) Both commands can be specified accordingly; see post #450 for an example.

4) Option curtail() of xtdpdgmmfe can be used to set a maximum lag depth of 3 for all sets of instruments. For an endogenous variable, this would use lags 2 and 3. For a predetermined variable, this would use lags 1 to 3. The command is less flexible regarding individual lag orders for different variables. You also cannot easily specify lags 2 to 4 for endogenous but lags 1 to 3 for predetermined variables. This is intentional to reduce the temptation for researchers to search for the "nicest" model. Keeping the maximum lag order constant is the least arbitrary approach. It will give predetermined variables one more instrument than endogenous variables. Again, this is intentional as it utilizes the additional overidentifying restriction from making the stronger predeterminedness assumption.

5) The sequential model selection process is not required. It is merely a suggestion to reduce the arbitrariness of the modeling choice.

6) The doubly-corrected robust standard errors are generally recommended.

7) No, lag() is an abbreviation of lagrange().

8) These commands do not support nonlinear models for limited dependent variables, only the linear probability model.
Leave a comment:
Zainab Mariam replied

10 Sep 2022, 08:29
Dear Professor Sebastian,

Thank you very much for your swift valuable reply. I am very grateful to you for all your support and effort, professor! Please, if I may follow up with your response!

1) Regarding post #479 point 4) “If industry dummies are specified as instruments but they do not appear in that list, then the respective instruments are likely omitted due to some perfect collinearity. For difference GMM, this might be the case if those industry dummies are time-invariant.”.

I have 8 industries. Thus, I generated 8 industry dummies using ‘tabulate var, gen(Industry)’. On page 86 of your PDF file presentation, the list ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table shows industry dummies instruments as follows “2bn.ind 3.ind 4.ind 5.ind 6.ind 7.ind 8.ind 9.ind”. Thus, I have the following questions:

1.A) Which code of the following can I apply in order for the list ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table to show industry dummies instruments similar to yours on page 86 of your PDF file presentation?

In the following four codes, I apply the (FOD) estimator. Where:

y is the dependent variable;
L.y is the lagged dependent variable as a regressor (L.y is predetermined);
L.x1 is the independent variable (L.x1 is endogenous);
The control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 are predetermined.
The control variable x10 (firm age) is exogenous.

1.1) In the first code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘model(md)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(md)) teffects small two vce(r)

1.2) In the second code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(level)) teffects small two vce(r)

1.3) In the third code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘diff model(diff)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, diff model(diff)) teffects small two vce(r)

1.4) In the fourth code, I typed in the regression all the industries (8 industries) as regressors. Then, to instrument the industry dummies, I put in the iv( ) option all the industries (8 industries) with ‘diff’ and I put in the iv( ) option all the industries (8 industries) with ‘model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(fod) collapse gmm(y, lag(1 3)) gmm(L.x1, lag(1 3)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0)) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, diff) iv(Industry1 Industry2 Industry3 Industry4 Industry5 Industry6 Industry7 Industry8, model(level)) teffects small two vce(r)

If none of the above codes can show industry dummies instruments in the list ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table similar to yours on page 86 of your PDF file presentation, I kindly ask you please for the code I should use to show industry dummies instruments in the list.

1.B) Is it correct to put in the regression code all the industry dummies (8 industries) included in the regression model as regressors? or do I have to exclude manually one of them?

1.C) To instrument the industry dummies, is it correct to put in the iv( ) option all the industries (8 industries) included as regressors in the regression model? i.e., do I have to instrument all the industry dummies included in the regression model?

1.D) I include the options: ‘teffects’ ‘small’ ‘two’ ‘vce(r)’. Is it better/required to add the option ‘nolevel’?

2) Regarding post #479 point 10) “No, the Chudik-Pesaran estimator requires all variables to be either strictly exogenous or predetermined. xtdpdgmmfe "solves" this issue by switching to a specific version of a difference GMM estimator when endogenous variables are present.”

Thus, my questions are:

2.1) Can your command ‘xtdpdgmmfe’ perform by itself the switching to a specific version of a difference GMM estimator when endogenous variables are present, or do I have to perform the switching manually? If so, how?

2.2) Can the previous command ‘xtdpdgmm’ perform the Chudik-Pesaran estimator when the regression model includes endogenous variables? If so, how?

3) To apply the Hayakawa, Qi, and Breitung estimator in a panel model with an endogenous variable, can the command ‘xtdpdgmm’, the command ‘xtdpdgmmfe’, or both commands perform the Hayakawa, Qi, and Breitung estimator in a panel model with an endogenous variable?

4) When using your command ‘xtdpdgmmfe’, can the regression model include three lags of each regressor?

5) When using your command ‘xtdpdgmmfe’, is it required to follow the ‘Sequential model selection process’?

6) Is it better/required to add the option ‘dc’ (doubly-corrected robust standard errors) when implementing the different estimators (such as the difference-GMM estimator and the system-GMM estimator, …)?

7) Is there any difference between gmmiv(var, lagrange(1 .)) and gmm(var, lag(1 .))?

8) As your commands apply the fixed effects estimator (FE), can I use your commands to apply the Instrumental Variable Tobit (IVTobit) method for panel data? (y is a limited dependent variable).

I do appreciate your patience, support and effort.
Leave a comment:
Sebastian Kripfganz replied

02 Sep 2022, 07:49
1) There are 2 main reasons why this can happen:
a) xtreg uses stronger assumptions which are violated, e.g. because some regressors are predetermined or endogenous. As a result, coefficient estimates could be closter to 0 with GMM and eventually insignificant.
b) Standard errors are generally much larger with GMM due to the possible weakness of the instruments. This raises the chance of getting statistically insignificant results.

2) Yes, and yes.

3) Yes, a full set of dummies generally leads to perfect collinearity.

4) If industry dummies are specified as instruments but they do not appear in that list, then the respective instruments are likely omitted due to some perfect collinearity. For difference GMM, this might be the case if those industry dummies are time-invariant.

5) Yes, the inclusion of L2.x1 as a regressor does not invalidate it as an instrument.

6) Yes, with nonlinear moment conditions a numerical optimization procedure is required, which can take much longer, especially when the data set is relatively large. The message "not concave" can be ignored if it only appears for intermediate iterations. If it appears for the final iteration, then there are numerical difficulties and the algorithm did not converge to a proper solution. In such a case, a simplification of the model is required, which usually involves fewer instruments or even the abandonment of nonlinear moment conditions.

7) This is similar in spirit to a difference GMM estimator, where X2 X3 are strictly exogenous and X1 is endogenous. People probably would not call it a "difference GMM" estimator because it does not exclusively use model(diff). Some people may even call it a "system GMM" estimator because it is based on a system of two models, model(mdev) and model(diff), but this could lead to confusion with the traditional system GMM estimator, which used model(diff) (or model(fod)) and model(level). There is no commonly accepted name for this kind of estimator.

8) No, xtdpdgmmfe uses a different syntax which some users might find easier. It then translates this syntax into the syntax required for xtdpdgmm. The computations are still performed with the latter command. Please see the help file for xtdpdgmmfe and my earlier post #450 in this Statalist topic. If needed, you can subsequently modify the xtdpdgmm command line displayed by xtdpdgmmfe.

9) Yes.

10) No, the Chudik-Pesaran estimator requires all variables to be either strictly exogenous or predetermined. xtdpdgmmfe "solves" this issue by switching to a specific version of a difference GMM estimator when endogenous variables are present.

11) Yes.
Leave a comment:
Zainab Mariam replied

02 Sep 2022, 06:44
Dear Professor Sebastian,

Thank you very much for your valuable reply. I do not know how to thank you, professor! I am very grateful to you for all your support and effort.

I still have the following questions, please! Hopefully, it will be the final questions set.

1) What if the coefficient becomes statistically insignificant when applying GMM even though the coefficient was significant when applying ‘xtreg’. Is there any reason for this issue?

2) When the option “teffects” is included in the code of the 'xtdpdgmm' command, the first three years do not appear in the findings. Is it normal? Is it due to including the second lag of the dependent variable as a regressor?

3) I have 8 industries. The findings show that the last industry is omitted because of collinearity. Is it normal?

4) Regarding the ‘Instruments corresponding to the linear moment conditions:’ that appear under the regression outputs table, what if nothing in terms of the industry dummies appears there?

5) The independent variable of my regression model is L.x1 (L.x1 is endogenous). Also, my regression model includes L2.x1. Thus, for the (FOD) estimator, the instruments for L.x1 should start from the first lag of L.x1 i.e., the first instrument for L.x1 is L2.x1. Thus, my question is: Is it right to use L2.x1 as an instrument for L.x1, given that L2.x1 is already included in the regression model as a regressor?

6) When the option ‘nl(noserial)’ is included in the code, it takes a long time to perform the regression. Is it normal? Also, what if the message (not concave) appears for Iteration?

7) Regarding post #471 point 8) “This suggestion carries over to dynamic panel models, yes.”

Thus, my question is: What does the following code apply? i.e., which estimator does this code perform? Does it apply the Differenced GMM estimator, the System GMM estimator, or something else?

“Code:
xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)”

8) To apply your command ‘xtdpdgmmfe’, do I have to just type ‘xtdpdgmmfe’ instead of ‘xtdpdgmm’ and keep everything else the same in the code of ‘xtdpdgmm’? i.e., to apply your command ‘xtdpdgmmfe’, is the only thing I need to work on in the code of your ‘xtdpdgmm’ command to replace ‘xtdpdgmm’ with your command ‘xtdpdgmmfe’? if no, I kindly ask you please what I have to do.

9) Is your command ‘xtdpdgmmfe’ applicable in a static panel model and in a dynamic panel model with an endogenous variable?

10) Is the Chudik-Pesaran estimator applicable in a panel model with an endogenous variable? If no, does ‘xtdpdgmmfe’ command solve this issue?

11) Is the Hayakawa, Qi, and Breitung estimator applicable in a panel model with an endogenous variable?

Your patience, cooperation, and effort are highly appreciated.

Last edited by Zainab Mariam; 02 Sep 2022, 07:04.
Leave a comment:
Sebastian Kripfganz replied

02 Sep 2022, 03:40
1) 1.1 and 1.2 are only equivalent, if you have not specified model(diff) as a separate option in your command line, because the latter would override the model(level) default. In contrast, 1.2 and 1.3 are only equivalent, if you have changed the default by specifying model(diff) as a separate option. All other specifications differ from each other. For a difference GMM estimator, one would typically use 1.3 to instrument dummy variables.

2) Yes, you would just remove the lagged dependent variable and the instruments for the lagged dependent variable.

3) No, you can compute a difference-in-Hansen test by separately estimating the two models which you want to compare, e.g.

Code:

. webuse abdata . xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(1 4)) m(d) c two vce(r) . estimates store ab . xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(1 4) m(d)) iv(L.n w k, d) two c vce(r) . estat overid ab

While the test result differs numerically from the one obtained without separate estimations, the two versions are asymptotically equivalent:

Code:

. xtdpdgmm L(0/1).n w k, gmm(L.n w k, l(1 4) m(d)) iv(L.n w k, d) two c vce(r) overid . estat overid, difference

4) You would normally test for underidentification before you run the overidentification tests.

5) The xtdpdgmm command does not have an orthogonal option; the command xtdpdgmmfe does. With xtdpdgmm, you would specify model(fod) instead of model(diff), but you also need to adjust the lag order for the instruments, e.g. gmm(x, lag(1 3) model(diff)) would become gmm(x, lag(0 2) model(fod)). For a system GMM estimator, you would leave the model(level) instruments unchanged.

6) model(md) can also be used for strictly exogenous variables in combination with model(diff) for other variables. This works both for a difference GMM and a system GMM estimator. There is full flexibility.

7) On slide 108 of my 2019 London Stata Conference presentation, the instruments gmm(w, lag(1 .)) are valid for an endogenous variable with model(fod). I then add gmm(w, lag(0 0)), which adds the extra instrument valid for a predetermined variable. The difference-in-Hansen test on the next slide (line 7 of the results table) then checks the validity of that additional instrument. Here, the p-value of 0.3 appears sufficiently high to maintain the assumption that this variable is predetermined. You can then proceed in the next step by adding gmm(w, lag(0 0) model(md)), which would be valid under strict exogeneity.

8) You would just use the procedure described in 7 to do this test.

9) By "traditional" difference GMM estimator, I mean the one proposed by Arellano and Bond (1991) with linear moment conditions only. With nonlinear moment conditions, people would not call it the "difference GMM" estimator anymore. It is just the Ahn-Schmidt estimator. In practice, nl(noserial) can be added almost without cost because the underlying assumptions are basically the same as with the Arellano-Bond estimator. Thus, the added efficiency from the nonlinear moment conditions comes for free. nl(noserial) is not typically used with system GMM because the nonlinear moment conditions become redundant once the extra instruments for model(level) are added.

10) The Ahn-Schmidt estimator differs from the Arellano-Bond difference GMM estimator and the Blundell-Bond system GMM estimator as just described. Iterated GMM refers to the estimation technique. You can have an iterated difference GMM or an iterated system GMM estimator as an alternative to corresponding one-step or two-step GMM estimators.
Leave a comment:
Zainab Mariam replied

31 Aug 2022, 11:14
Dear Professor Sebastian,

Many thanks for your valuable response. Indeed, saying "thank you very much" is not enough. I am very grateful to you for all your support and effort.

I still have the following questions which might be the set of questions before the final one, please!

1) Are the following equivalent or different? Which is the one for the Difference GMM estimator (i.e., to be used for the differenced model)? Which one is to be used for the level model and the System GMM estimator?

1.1) iv(x, lag( ) diff model(level))

1.2) iv(x, lag( ) diff)

1.3) iv(x, lag( ) model(diff) diff)

1.4) iv(x, lag( ) model(level))

1.5) iv(x, lag( ) model(diff))

2) Can I use your command ‘xtdpdgmm’ to implement the Difference GMM estimator and the System GMM estimator for a static model? If so, is the only thing I should do (to implement the Difference GMM estimator and the System GMM estimator for the static model) just not to include the lagged dependent variable in the regression model, and use the same codes I used when implementing the Difference GMM estimator and the System GMM estimator for the dynamic model?

3) Regarding post #471 point 6) “The overid option has no effect on the estimation of the model. It allows for the computation of additional "difference-in-Hansen" test statistics with estat overid.”

Thus, I have the following questions:

3.A) Does it mean that the difference-in-Hansen test (the Sargan-Hansen difference test of the overidentifying restrictions) cannot be performed without specifying the option ‘overid’ in the ‘xtdpdgmm’ command line?

3.B) To perform the difference-in-Hansen test (the Sargan-Hansen difference test of the overidentifying restrictions), should ‘estat overid’ be used? Or should ‘estat overid, difference’ be used?

3.C) What I know is that ‘estat overid’ performs the Sargan-Hansen test of the overidentifying restrictions, whereas ‘estat overid, difference’ perform the difference-in-Hansen test (the incremental Hansen test/the Sargan-Hansen difference test of the overidentifying restrictions). Am I right?

4) When applying the Difference GMM estimator and the System GMM estimator, do I have to apply the 'under-identification test’? if so, at which step i.e., after/before ‘running xtdpdgmm regression’; ‘estat serial, ar(1/3)’; ‘estat overid’; ‘estat overid, difference’?

5) I have unbalanced panel data. To implement the Difference GMM estimator, can I include the option ‘orthogonal’ in the code of your ‘xtdpdgmm’ command? If no, what to include instead of ‘orthogonal’ to deal with unbalanced panel data in order to implement the Difference GMM estimator using your command ‘xtdpdgmm’? Also, can I include the option ‘orthogonal’ in the code of your ‘xtdpdgmm’ command to implement the System GMM estimator?

6) To use ‘model(md)’ ‘model(mdev)’, is it required to apply the ‘FOD’ estimator? i.e., is ‘model(md)’ accompanied with/conditional to applying the ‘FOD’ estimator? Can I use ‘model(md)’ without using the ‘FOD’ estimator? Can I use ‘model(md)’ when applying the Difference GMM estimator ‘model(diff)’? Can I use ‘model(md)’ when applying the System GMM estimator?

7) How could ‘estat overid, difference’ check the correct classification of variables as endogenous, predetermined, or exogenous? I kindly ask you please to give me an example. I read your PDF file presentation, but I feel that I am not able to understand it fully.

8) How to decide/determine whether a variable is either strictly exogenous, endogenous, or predetermined? Is there a test to classify whether a variable is either strictly exogenous, endogenous, or predetermined?

9) Regarding post #473 point 5) “For the traditional "difference GMM" estimator, you should not include the nl(noserial) option. With this option, you can implement the Ahn-Schmidt GMM estimator with nonlinear moment conditions, which can be an alternative to the system GMM estimator.”. Thus, I have the following questions:

9.1) What do you mean by the traditional "difference GMM" estimator?

9.2) What is the Difference GMM estimator that the ‘nl(noserial)’ option can be included in? i.e., for which Difference GMM estimator can the ‘nl(noserial)’ option be included?

9.3) Can the ‘nl(noserial)’ option be included in the System GMM estimator?

On page 58 of your PDF file presentation “These nonlinear moment conditions are redundant when added to the sys-GMM moment conditions (Blundell and Bond, 1998) but improve efficiency when added to the diff-GMM moment conditions. Furthermore, they may provide identification when the diff-GMM estimator does not”

10) Are the iterated GMM estimator and the Ahn-Schmidt GMM estimator different from the Difference GMM estimator and the System GMM estimator? If so, in which aspects? and which one is better?

Your patience, cooperation, and support are highly appreciated.
Leave a comment:
Sebastian Kripfganz replied

31 Aug 2022, 09:40
1.1 and 1.2) With model(diff), lag 0 is not a valid instrument for predetermined variables. The first admissible lag is 1. Please see the Remarks section in the xtdpdgmm help file.
1.3 and 1.4) Both codes are valid. I recommend 1.3.
1.5) This is only valid if those level instruments are uncorrelated with the unobserved group-specific effects, which effectively requires a random-effects assumption.
1.6) This does not produce any instruments for the level model because you set the default to model(diff).

2.1) iv(i.ind, diff) iv(i.fc, diff) iv(i.mn, diff) produces differenced instruments for model(diff), which was set as the default. This is valid but typically inefficient for dummy variables, which are usually assumed to be uncorrelated with the unobserved group-specific effects.
2.2) Again, this is valid but the diff option creates inefficiencies.
2.3) This is the recommended approach for dummies in system GMM. (Remember to adjust the lag order for predetermined variables in this code.)

3.A) It is generally recommended to instrument the dummies only in either the first-differenced or the level model to avoid redundancies. For system GMM, the level model is typically recommended.
3.B) For the differenced model, you would typically combine diff model(diff). For the level model, you would just use model(level). For dummy variables in system GMM, the latter is recommended.

4) For dummies, you would typically use iv().

5) When you use the iv() option for dummies, you do not need to use the lag() suboption because the default is already lag(0 0). Further lags are redundant for dummies.

6) You would normally treat the exogenous variable in the same way as a predetermined or endogenous variable, but just adjust the lag order appropriately; see you code 1.3. As an exception, when the exogenous variable is also assumed to be uncorrelated with the unobserved group-specific effects, you do not need to include the diff option with model(level).

7) You normally instrument all variables in the differenced model (possibly excluding dummy variables). If your variables satisfy the additional Blundell-Bond assumption (sufficient: mean stationarity), then you additionally instrument them in the level model.

8) I would recommend to start with a "difference GMM" estimation and to run estat overid for this estimation first. If it rejects the difference GMM estimation, then there is no point going on to the system GMM estimation.
Leave a comment:
Zainab Mariam replied

27 Aug 2022, 09:47
Dear Professor Sebastian,

Many thanks for your valuable reply. Thank you for your permission to include your name in the acknowledgement section. Of course, I will cite your 'xtdpdgmm' package as you mentioned exactly.

Please, allow me to ask the following questions. Sorry!

I will apply the System GMM estimator using your command ‘xtdpdgmm’. I will consider the lagged dependent variable L.y as predetermined (as you kindly suggested), the independent variable L.x1 as endogenous, while the control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 as predetermined, and the control variable x10 (firm age) as exogenous. Thus, my questions are:

1) Which is the correct code of the following that I can use to implement the System GMM estimator?

1.1) In the first code, for the differenced model, I use lag(0 2) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 2) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 2))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.2) In the second code, for the differenced model, I use lag(0 2) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 0) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.3) In the third code, for the differenced model, I use lag(1 3) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 2) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 2))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.4) In the fourth code, for the differenced model, I use lag(1 3) as instruments for the predetermined variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9, and lag(0 0) for the exogenous variable x10 (firm age) as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) teffects two vce(r)

1.5) In the fifth code, for the level model, I put ‘model(level)’ instead of ‘diff model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) model(level)) gmm(L.x1, lag(1 1) model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) model(level)) teffects two vce(r)

1.6) In the sixth code, for the level model, I put ‘diff’ instead of ‘diff model(level)’ as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff) gmm(L.x1, lag(1 1) diff) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff) teffects two vce(r)

Where:
y is the dependent variable;
L.y is the lagged dependent variable as a regressor (L.y is predetermined);
L.x1 is the independent variable (L.x1 is endogenous);
The control variables L.x2, L.x3, L.x4, L.x5, L.x6, L.x7, L.x8, L.x9 are predetermined.
The control variable x10 (firm age) is exogenous.

If none of the previous codes is correct, what is the correct code that I have to use in order to implement the System GMM estimator?

2) My regression model includes a dummy variable (fc, this dummy variable takes the value of 1 for the 3 years 2008, 2009, and 2010). Also, it includes industry dummies (ind) and country dummies (mn) to examine the industry effects and the country effects. Thus, my question is: to apply the System GMM estimator, which is the correct code of the following?

2.1) In the first code, I put ‘diff’ in iv() option for dummies as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, diff) iv(i.fc, diff) iv(i.mn, diff) two vce(r)

2.2) In the second code, I put ‘diff model(level)’ in iv() option for dummies as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(1 3)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, diff model(level)) iv(i.fc, diff model(level)) iv(i.mn, diff model(level)) two vce(r)

2.3) In the third code, I put ‘model(level)’ without ‘diff’ in iv() option for dummies as follows:

. xtdpdgmm L.(0/1) y L.(x1 x2 x3 x4 x5 x6 x7 x8 x9) x10 i.ind i.fc i.mn, model(diff) collapse gmm(y, lag(2 4)) gmm(L.x1, lag(2 4)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9, lag(0 2)) gmm(x10, lag(0 0))///
> gmm(y, lag(1 1) diff model(level)) gmm(L.x1, lag(1 1) diff model(level)) gmm(L.x2 L.x3 L.x4 L.x5 L.x6 L.x7 L.x8 L.x9 x10, lag(0 0) diff model(level)) iv(i.ind, model(level)) iv(i.fc, model(level)) iv(i.mn, model(level)) two vce(r)

If none of the previous codes is correct, what is the correct code that I have to use in order to implement the System GMM estimator, given that the regression model includes the dummy variable (fc), the country dummies (mn), and the industry dummies (ind)?

3) To implement the System GMM estimator using your command ‘xtdpdgmm’, I have the following questions:

3.A) Is it necessary/required to instrument the dummies (cf, year, industry, and country dummies) in the differenced model only, or in the level model only, or in both models (differenced and level)?

3.B) What is the option that should be accompanied with the dummies in the differenced model and in the level model? i.e., which option of the following ‘model (diff)’, ‘model(level)’, ‘diff’, ‘diff model(level)’ should be used for dummies in the differenced model? And which option should be used for the level model?

4) Do I have to use ‘iv’ or ‘gmm’ for the dummies (cf, year, industry, and country dummies)?

5) Is it necessary/required to mention ‘lag( )’ for the dummies (cf, year, industry, and country dummies)? If no, why?

6) The control variable x10 (firm age) is exogenous. Is it better to instrument it in the differenced model, or in the level model, or in both models? For the exogenous variable (firm age), do I have to use the same lag( ) for the differenced model and for the level model? {i.e., for the differenced model: gmm(x10, lag(0 0) model(diff) collapse). For the level model: gmm(x10, lag(0 0) diff model(level) collapse)}. Or should be the lag( ) used for the differenced model different from the lag( ) used for the level model? {i.e., for the differenced model: gmm(x10, lag(0 2) model(diff) collapse). For the level model: gmm(x10, lag(0 0) diff model(level) collapse)}.

7) When I apply the System GMM estimation, do I have to instrument all the variables included in the regression model for both models (differenced model and level model)? Or for the level model, do I have to instrument only the variables that I do not instrument for the differenced model, and vice versa?

8) Please, correct me if I am wrong regarding the order/steps of applying the System GMM estimation. I should first run the regression of the System GMM estimator using your command ‘xtdpdgmm’. Second, I should apply ‘estat serial’ in order to test for serial correlation of residuals. Third, I should apply ‘estat overid’ that performs the Sargan-Hansen test of the overidentifying restrictions in order to test whether the instruments (used for the differenced model) are valid. Fourth, I should apply ‘estat overid, difference’ that performs the Sargan-Hansen difference test of the overidentifying restrictions in order to test whether the additional instruments employed in the System GMM (used for the level model) are valid. Also, do I have to change, amend, or add anything to those steps in order to implement the System GMM estimator using your command ‘xtdpdgmm’?

I am very grateful to you for all your support and effort.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: