XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Prateek Bedi

Join Date: Sep 2018

Posts: 199
#286

21 May 2021, 07:16

Alright, Prof. Kripfganz. In light of your response, please consider the following doubts:

1. Is there any specific reason as to why maximum admissible lag order has been set at T-1 in xtdpdgmm?
2. Conceptually (not in relation to xtdpdgmm), when we use first difference transformation (using model(diff)) option, the maximum admissible lag order should be T-1 because presence of lagged dependent variable and first-difference transformation lead to the omission of the same year i.e. first year of the dataset only. Is this understanding conceptually correct?
3. Conceptually (not in relation to xtdpdgmm), when we use forward orthogonal deviations, the maximum admissible lag order should be T-2 because of presence of lagged dependent variable (which leads to omission of the first year of dataset) and loss of the last year of the dataset on account of forward orthogonal deviations. Is this understanding conceptually correct?

Thanks!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#287

21 May 2021, 07:22

If you have T observations in your data set, then you can have at most T-1 lags. Everything beyond that is outside of the range of the data set.

I think my earlier statement in that regard was misleading. Apologies. For the maximum lag order, it does not matter whether there is a lagged dependent variable in the model or whether the model is transformed. For the first-differenced model, you are still typically using instruments in levels; so the maximum lag order T-1 applies again (see point 1).

It is again T-1. Same argument as in point 2.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#288

21 May 2021, 11:03

Prof. Sebastian:

Thanks for a quick response. I have these follow-up queries:

1. If the determination of maximum lag order is not affected by presence of lagged dependent variable in the model or by model transformation, I would like to know why does xtdpdgmm provide an error when I try to specify a maximum lag order of more than T-2 in a model which has lagged dependent variable as an explanatory variable and employs forward orthogonal deviations?

2. I agree that for the first-differenced model, there are indeed T-1 lagged instruments available at level. However, for the level model, should not there be T-2 first-differenced instruments available? This is because the first-differenced values begin from second year onwards. Now, for the level model, there are only T-2 lagged first differenced values left to be used as instruments because the contemporaneous value cannot be used as instrument and the first-differenced values begin from the second year itself (so one value is lost there as well).

Please pardon me if my queries sound silly to you..
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#289

21 May 2021, 14:01

Your questions are perfectly legit. In fact, my previous answer was incorrect. My sincere apology.

It is correct: For the model in forward-orthogonal deviations, the maximum lag order is only T-2 instead of T-1 because the last observation is effectively removed. Moreover, when the instruments are first differenced, the maximum lag order is as well only T-2, as you correctly expected.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#290

21 May 2021, 14:59

Prof. Kripfganz:

Thanks a lot for your prompt response. So, it seems clear that maximum lag order should be T-2 in the case of forward orthogonal deviations as well as first-differenced transformation. But, I noticed that xtdpdgmm allows the use of T-1 as the maximum lag order in case of first-differenced transformation. Could you please help me understand the reason for this?

Thanks!!!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#291

22 May 2021, 04:03

My previous statement was about the first-difference transformation of the instruments with the suboption difference, not the first-difference transformation of the regressors with the suboption model(difference). The question is always how many lags are in the data set relative to the last effective observation. For the first-differenced model, the last effective observation is also the last actual observation in the data. For the model with forward-orthogonal deviations, the last effective observation is the second-last actual observation.

Note that this does not constitute an advantage for the first-differenced model over forward-orthogonal deviations, because with the latter you would start already with a smaller lag. Say, if you start with lag 1 for the first-differenced model, then you would start with lag 0 for the model with forward-orthogonal deviations. Therefore, you would use the same number of lagged instruments in both cases.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#292

23 May 2021, 11:31

Alright, Prof. Kripfganz. If we talk about the transformation of the regressors with the suboption model(), should the maximum lag order be T-2 for both first difference (i.e. model(diff)) and forward orthogonal deviations (i.e. model(fodev))? (assuming that I do not specify any transformation of the instruments in my command)

Last edited by Prateek Bedi; 23 May 2021, 11:35.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#293

23 May 2021, 11:42

No. Let's look at a simple example. Suppose, you have (untransformed) data for T=4 time periods, t=1,2,3,4. In first differences, the last available observation is for t=4. The largest lag would be to use an instrument from the observation t=1. Thus, the maximum lag order is 4-1=3 (i.e. T-1). In forward-orthogonal deviations, the last available observation is for t=3. The largest lag would still be to use an instrument from the observation t=1. Thus, the maximum lag order is 3-1=2 (i.e. T-2).

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#294

26 May 2021, 15:38

Originally posted by Sebastian Kripfganz View Post

No. Let's look at a simple example. Suppose, you have (untransformed) data for T=4 time periods, t=1,2,3,4. In first differences, the last available observation is for t=4. The largest lag would be to use an instrument from the observation t=1. Thus, the maximum lag order is 4-1=3 (i.e. T-1). In forward-orthogonal deviations, the last available observation is for t=3. The largest lag would still be to use an instrument from the observation t=1. Thus, the maximum lag order is 3-1=2 (i.e. T-2).

Thanks, Prof. Kripfganz. Now, I would like to ask the following questions to enhance my understanding. As per my limited knowledge a system-GMM estimator has two equations - level equation and transformed equation. The transformed equation may be transformed using first-difference transformation or forward orthogonal deviations. Also, the lags of level values serve as instruments for the transformed equation and lags of the transformed values serve as instruments for the level equation. I have the following queries from the perspective of the level equation.

1. Suppose the transformed equation has been obtained using first-difference transformation. Now for the level equation, the first-difference values have to serve as instruments. Assuming T=4 time periods, the first observation of the transformed equation is lost by construction. Hence, the first-differenced values begin from t=2. Now, for the level equation, should not the maximum lag order be T-2? For instance, for the level value at t=4, the lags start from t=2 (because the first-differenced values begin from t=2) and end at t=3 (because the first-differenced value at t=4 cannot be used an instrument for level value at t=4).

2. Suppose the transformed equation has been obtained using forward-orthogonal transformation. Now for the level equation, the forward-orthogonal values have to serve as instruments. Assuming T=4 time periods, the first observation of the transformed equation in this case begins from t=1. Now, for the level equation, should not the maximum lag order be T-1? For instance, for the level value at t=4, the lags start from t=1 and end at t=3 (because the forward-orthogonal value at t=4 cannot be used an instrument for level value at t=4).

Please correct me if my understanding is wrong.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#295

27 May 2021, 03:10

If you had strictly exogenous variables, you could also use the first-differenced value from t=4 as an instrument for the level regressors at t=4. Other than that, you are correct. The largest possible lag would be the first-differenced value from t=2 as an instrument for the level regressors at t=4. Thus, the maximum lag order is 4-2=2 (i.e. T-2).

You usually cannot use the forward-orthogonally transformed values as instruments for the level model. Due to the subtraction of future information (as opposed to past observation when using first differences), the forward-orthogonal transformation would make these instruments invalid unless the variables are strictly exogenous. xtdpdgmm does not offer a forward-orthogonal transformation of the instruments. For the level model, you would typically still first difference the instruments, even if you used forward-orthogonal deviations for the transformed model/equation.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#296

28 May 2021, 15:17

Thanks, Prof. Kripfganz. In relation to your response, I have the following doubts.

1. When do we need to transform the instruments? How do we decide which transformation to go for? How do we transform the instruments in xtdpdgmm?

2. I suppose that the lag order which we specify in the sub-option lag() is applicable for instruments to be used for both level as well transformed model. Is this correct?

3. If the answer to query #2 is yes, the maximum lag order in the sub-option lag() should allow for a value which is permitted to be used as an instrument in either of the models (level or transformed). Is this correct?.

4. You mentioned in post #295 that for the level model, we need to first difference the instruments, even if we use forward-orthogonal deviations for the transformed model/equation. So, in a model which employs forward-orthogonal deviations for the transformed model, the maximum lag order allowed shall be T-2 only whether we look at it from the perspective of level model or transformed model. Is this understanding correct?

5. In post #293, you mentioned that for a model in which transformed equation has been obtained by taking first-difference, the maximum lag order allowed is T-1. Is this because for the first-differenced model, we can use lags (in levels) starting from t=1 and go on till T-1? (However, we can still only use T-2 lags at max for the level model as you mentioned in point #1 of post #295).

Thanks for your invaluable guidance!!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#297

29 May 2021, 05:25

1. For the model in first differences, it can be beneficial to also first difference the instrument if the variable is strictly exogenous. You cannot get a stronger instrument for D.x than using D.x itself. With xtdpdgmm, you simply use the iv(x, diff model(diff)) to transform the instrument for the first-differenced model. For a strictly exogenous variable, you could alternatively use the untransformed x as an instrument for the model in mean deviations, i.e. iv(x, model(mdev)), which would also maximize the correlation between the instrument and the regressor. When you have predetermined or endogenous variables, you need to use lags of those variables as instruments for the transformed model (i.e. first differences or forward-orthogonal deviations), and then it is less obvious whether transforming those instruments is beneficial or not. I tend to recommend not to transform the instruments for the transformed model in those cases. Forward-orthogonal deviations are useful when the data has gaps because it retains more information in that case.

2. If the variables are not strictly exogenous, you typically need to specify a lag order with suboption lag() for the level and the transformed models. Please see my 2019 London Stata Conference presentation for applicable lag orders.

3. The maximum lag order depends on the suboptions you choose:
model(level) T-1

model(level) difference T-2

model(difference) T-1

model(difference) difference T-2

model(fodev) T-2

model(fodev) bodev T-3

4. The maximum lag order is always specific to any particular iv() or gmm() option; see point 3.

5. Yes; see again point 3.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#298

29 May 2021, 15:12

Thank you so much, Prof. Kripfganz for such clear and precise answers. Your responses have provided me a better understanding of xtdpdgmm. I have further follow-up questions.

1. For a strictly exogenous variable (x), what does the use of iv(x, model(level)) signify? Does it mean that the level values of the exogenous variable are being used as instrument for it? Is this type of treatment of exogenous variable recommended? In case of this exogenous variable, is there any need to specify the sub-option lag() here i.e. iv(x, model(level) lag(1 2))?

2. In case of an endogenous/predetermined variable (say, p), you said that we need to use the lags of those variables as instruments for the transformed model (say, first-difference). As a sample, is this the correct way to do it in xtdpdgmm: gmmiv(p, lag(a b) model(diff))?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#299

01 Jun 2021, 06:30

1. iv(x, model(level)) requires an even stronger assumption that just strict exogeneity. It also requires that x is uncorrelated with the unobserved unit-specific effects ("fixed effects"), because the level values of x are used as instruments for the level model. The suboption lag() is generally only needed if x is not strictly exogenous. For the level model, usually no lags are used. Even if it is a predetermined or endogenous variable, the conventional use is iv(x, model(level) lag(1 1)) (or equivalently with the gmm() option).

2. Yes, a would typically be 1 for a predetermined and 2 for an endogenous variable with model(diff) or 0 for a predetermined and 1 for an endogenous variable with model(fodev), and provided the idiosyncratic error term (in levels) is serially uncorrelated. b could be the same or a higher number, or a missing value (.) for the maximum possible lag.

https://www.kripfganz.de/stata/
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#300

01 Jun 2021, 10:30

Originally posted by Sebastian Kripfganz View Post

1. iv(x, model(level)) requires an even stronger assumption that just strict exogeneity. It also requires that x is uncorrelated with the unobserved unit-specific effects ("fixed effects"), because the level values of x are used as instruments for the level model. The suboption lag() is generally only needed if x is not strictly exogenous. For the level model, usually no lags are used. Even if it is a predetermined or endogenous variable, the conventional use is iv(x, model(level) lag(1 1)) (or equivalently with the gmm() option).

2. Yes, a would typically be 1 for a predetermined and 2 for an endogenous variable with model(diff) or 0 for a predetermined and 1 for an endogenous variable with model(fodev), and provided the idiosyncratic error term (in levels) is serially uncorrelated. b could be the same or a higher number, or a missing value (.) for the maximum possible lag.

Thanks a lot, Prof. Kripfganz. Your answers are so much helpful. Please allow me to ask the following questions.

1. You mention in point #1 that iv(x, model(level)) requires an even stronger assumption of strict exogeneity, However, this assumption is what makes an exogenous variables, exogenous. Is this understanding correct? Also, in what other ways can we define the iv() options for an exogenous variable? I request you to please provide an example.

2. You also mention in point #1 - "For the level model, usually no lags are used.". Does this statement correspond to exogenous variables only?

3. You also mention in point #1 - "Even if it is a predetermined or endogenous variable, the conventional use is iv(x, model(level) lag(1 1))". Why are lags kept at (1 1) here?

4. Considering the fact that system GMM has a level equation and a transformed equation (which effectively involves all variables), how do we decide which model() sub-option (i.e. level/diff/fodev/bodev/mdev) to use each for an exogenous, endogenous and a predetermined variable? I ask this because if (let's say) I put an endogenous variable (say, x) in this way: gmmiv(x, model(diff) lag(a b)), does this mean that I am only using the transformed equation for x (what about the level model, then)? OR if (let's say), I put an exogenous variable (say, p) in this way: iv(p, model(level), lag(c d)), does this mean that I am only using the level equation for x (what about the transformed model, then)?
Comment

model(level)	T-1
model(level) difference	T-2
model(difference)	T-1
model(difference) difference	T-2
model(fodev)	T-2
model(fodev) bodev	T-3

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment