System GMM Replication using, e.g., xtdpdgmm

Carl Roth

Join Date: Sep 2024

Posts: 4
#1

System GMM Replication using, e.g., xtdpdgmm

16 Sep 2024, 03:08

I am trying to replicate a system GMM model from a paper (free version), e.g., using xtdpdgmm. Since I am new to this kind of models, I want to make sure that my implementation and understanding of the syntax is correct. The following additional information is given:
Panel B (below) reports the results of the System GMM estimates [...]. The reported number of observations refers to the level equation

The independent variables are not lagged, as it is assumed that the dataset has already been merged with the lagged independent variables

They use time effects, thus I have specified teffects

They set the lag order based on the BIC for the dependent and independent variables

For the independent variables, it is determined that they should be included with a one year lag (i.e.,I set the lag to (1 1))

For the dependent variable, I keep it with all lags (i.e., lag(2 .)) as this would depend on the BIC

Code:

xtset id time xtdpdgmm L(0/1) nplratio imr irb roae c_i e_ta gl_ta gdp un hpi, /// collapse /// gmm(nplratio, lag(2 .) model(diff)) /// gmm(nplratio, lag(2 .) model(level)) /// gmm(imr irb roae c_i e_ta gl_ta gdp un hpi, lag(1 1) model(diff)) /// gmm(imr irb roae c_i e_ta gl_ta gdp un hpi, lag(1 1) model(level)) /// teffects two vce(robust)

The model looks in principle like this:

Is my understanding of this procedure and the corresponding syntax correct?
Tags: dynamic panel data, gmm estimation, GMM panel data, System GMM - two step, xtdpdgmm
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#2

16 Sep 2024, 12:39

The referenced paper is another unfortunate example of an article which does not provide sufficient information to replicate their analysis. They do not mention anything about the lags used for the instruments, and do not report the total number of instruments. They also do not mention whether the regressors are assumed to be strictly exogenous or predetermined, and whether any collapsing was applied. The BIC is only used to select the lag order for the regressors, not the instruments. You would need to reach out to the authors and ask for their replication code.

https://www.kripfganz.de/stata/
Comment
Carl Roth

Join Date: Sep 2024

Posts: 4
#3

16 Sep 2024, 14:19

Dear Sebastian Kripfganz, thank you very much for your insights, this is very useful. Let's assume that via BIC it was determined that both the depended and the independent variables are entering with one lag, the regressors are predetermined, and collapsing is applied. Thus, two remaining questions:
Is the code that I provided fine in a sense that I specify both a 'diff' and a 'level' model inside gmm(). Then the remaining issue would be with how many lags the instruments are entering, is there any guiding principle to decide on an adequate lag range if including all instruments is not an option because of the sample size.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#4

17 Sep 2024, 00:57

For the level model, you want to use first-differenced instruments by adding the difference suboption, i.e.

Code:

gmm(nplratio, lag(2 .) model(level) difference) gmm(imr irb roae c_i e_ta gl_ta gdp un hpi, lag(1 .) model(level) difference)

There is only limited guidance about how to restrict the lag length for the instruments. You do not want to use too few because you would lose identification strength. You do not want to use too many because of overfitting concerns and because long lags become weak instruments. I would usually try going back to lag 4 or 5, or so. If you can show that the results are fairly stable for different choices, that would be ideal.

https://www.kripfganz.de/stata/
Comment
Carl Roth

Join Date: Sep 2024

Posts: 4
#5

17 Sep 2024, 10:52

Thank you very much Sebastian Kripfganz for pointing this suboption out to correctly specify the model. One last thought on the instruments, am I right in assuming that the lag structure should generally be the same between the dependent variable and the independent variables. Consequently, I would specify for the DV lag(2 5) and for the IV lag(1 4) accordingly to get 4 lags each. If this is note the case, I would assume that this likely would need to be guided by some clear theoretical/economic reasoning.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#6

18 Sep 2024, 03:29

Using the same number of lags is a reasonable approach. Similarly, it would be reasonable to use the same maximum lag; i.e. lag(2 4) for the DV and lag(1 4) for the IV. Any other approach would smell like p-hacking and require more careful justification.

https://www.kripfganz.de/stata/
Comment

Announcement

System GMM Replication using, e.g., xtdpdgmm

Comment

Comment

Comment

Comment

Comment