XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz replied

07 Feb 2024, 04:08
What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.
1 like
Leave a comment:
Tugrul Cinar replied

07 Feb 2024, 04:04
Dear Sebastian,

I am going to use a Micro dataset for an upcoming study. However, this dataset consists of random samples for each year, Essentially, it's a pooled dataset rather than panel data. Moreover, I suspect an endogeneity issue between the dependent and independent variables in the model I'm aiming to estimate. Additionally, the dataset encompasses roughly 100,000 units per year, spanning across seven years.

Given that the xtdpdgmm command is designed for linear (dynamic) panel data, do you recommend it for analyzing a pooled dataset?
Leave a comment:
Arkangel Cordero replied

22 Jan 2024, 10:55
Dear Professor @Sebastian Kripfganz

Understood. This is helpful. Thank you!
Leave a comment:
Sebastian Kripfganz replied

22 Jan 2024, 07:16
If the variables w k are strictly exogenous (with respect to the idiosyncratic error component), then any serial error correlation does not affect the validity of them (or any of their lags) as instruments. If there is serial error correlation due to the omission of relevant lags of w k as regressors, e.g. due to delayed direct effects of L2.(w k), then w k would not be strictly exogenous in the first place in a model with those omitted lags. Thus, saying that w k are strictly exogenous effectively is also a statement about the correct specification of the model dynamics.

In this regard, I wonder what your motivation is for including L.(w k) as regressors instead of w k. Sometimes, people do this to avert simultaneous feedback from the dependent variable. In that case, however, L.(w k) may not be endogenous any more, but they cannot be strictly exogenous either. At best, they would be predetermined (weakly exogenous). For predetermined variables, serial error correlation does matter for the validity of the instruments. Probably even more important, simply lagging the regressors for this argument typically creates model misspecification, which then puts the whole analysis in jeopardy.
Leave a comment:
Arkangel Cordero replied

21 Jan 2024, 16:34
Dear Professor @Sebastian Kripfganz

That all makes sense! I am grateful for your insights. These exchanges have been quite enlightening! I can clearly see the flexibility of xtdpdgmm. Thank you for this command!

In order to close these series of exchanges, I just want to confirm that assuming that L(w k) are exogenous, you comment that

So, yes, would need to adjust both lag orders if there is serial correlation.

would not apply to the iv(L(w k), model(difference)) iv(L(w k), difference model(level)) part of m3 below. My logic is that if L(w k) are assumed exogenous, then (and once the fixed-effects are expunged), iv(L(w k), model(difference)) iv(L(w k), difference model(level)) cannot possibly be causing any autocorrelation in the residuals by assumption? Is that correct? Do you have any practical guidance on this?

Code:

xtdpdgmm L(0/1).n L(w k), model(level) /// gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(1 1) difference model(level)) /// iv(L(w k), model(difference)) iv(L(w k), difference model(level)) /// two vce(cluster id) teffects estimate store m3

Thank you again.
Leave a comment:
Sebastian Kripfganz replied

21 Jan 2024, 07:32
Btw: I elaborate a bit on the lag(0 0) issue with xtabond2 in this other topic: https://www.statalist.org/forums/for...32#post1740632
Leave a comment:
Sebastian Kripfganz replied

21 Jan 2024, 06:29
1) Yes, m1 still relies on the random-effects assumption.

2) m2 looks fine; just keep in mind that a system GMM estimator generally requires stronger assumptions about the initial observations than a difference GMM estimator.

3) - 5) If higher-order serial correlation in the first-differenced residuals is detected, this would generally invalidate the lags for the dependent variable as instruments. Note that your specification is a bit odd in the sense that the instruments for the lagged dependent variable in the first-differenced model allow for second-order serial correlation in the first-differenced errors (because the first instrument is effectively only the third lag of n - the second lag of L.n), while the respective instruments for the level model do not allow for any serial correlation in the level errors (which would correspond to at most first-order serial correlation in the first-differenced errors). In that regard, m3 makes more sense. So, yes, would need to adjust both lag orders if there is serial correlation. In the basic case without serial correlation in level (but first-order serial correlation detected by estat serial in first differences), you would specify gmm(L(n), lag(1 2) model(difference)) gmm(L(n), lag(0 0) difference model(level)).
Leave a comment:
Arkangel Cordero replied

20 Jan 2024, 13:36
Dear Professor @Sebastian Kripfganz

Got it. Thank you so much for your helpful insights. I think I'll stick with xtdpdgmm.

Some final questions in response to your comment at the end of #624 that:

As an aside, note that iv(L(w k)) for the level model makes the strong assumption that w and k are uncorrelated with the unobserved group-specific effects (akin to a random-effects assumption), which may not be desired.

Regarding the "iv-style" instruments with exogenous L(w k):

1) Assuming that i) L(w k) are exogenous but ii) we don't want to rely on the random-effects assumption while iii) still wanting to instrumentalize for w and k in both equations in the a system gmm model, would m2 below be preferable to m1 below?

2) Do you see anything inherently problematic with m2 below regarding the "iv-style" instruments for exogenous L(w k)?

3) Does the fact that "estat serial, ar()" test after xtdpdgmm reveals ar(1)--or higher-- are statistically significant have any implications for the lags of the "iv-style" instruments?

Regarding the "gmm-style" instruments with endogenous L(n):

4) Assuming that L(n) is endogenous and that the "estat serial, ar()" test after xtdpdgmm reveals that only ar(1) is statistically significant, would m2 be preferable to m3? The difference between these specifications is the lag(0 0) or lag(1 1) in the gmm(L(n), lag(x x) difference model(level)).

5) I guess my more general question is on the appropriate lags for the first-difference instruments in the levels equation for the gmm-style instruments. Do we need to adjust that lags depending on those results of the "estat serial, ar()" test after xtdpdgmm just as we have to adjust the lags in the difference model? If so, in the most basic case where L(n) is endogenous but only ar(1) is statistically significant, would the lags have to be lag(1 1) so that the instruments for the level equation would be D.L2(n)?

Code:

webuse abdata xtdpdgmm L(0/1).n L(w k), model(level) /// gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(0 0) difference model(level)) /// iv(L(w k)) iv(L(w k), difference model(difference)) /// two vce(cluster id) teffects estimate store m1 xtdpdgmm L(0/1).n L(w k), model(level) /// gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(0 0) difference model(level)) /// iv(L(w k), model(difference)) iv(L(w k), difference model(level)) /// two vce(cluster id) teffects estimate store m2 xtdpdgmm L(0/1).n L(w k), model(level) /// gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(1 1) difference model(level)) /// iv(L(w k), model(difference)) iv(L(w k), difference model(level)) /// two vce(cluster id) teffects estimate store m3
Last edited by Arkangel Cordero; 20 Jan 2024, 14:23.
Leave a comment:
Sebastian Kripfganz replied

20 Jan 2024, 10:04
In general, iv() is just a collapsed version of gmm(); see also the help file for xtdpdgmm:

gmmiv(varlist, lagrange(#_1 #_2) collapse) is equivalent to iv(varlist, lagrange(#_1 #_2))

For xtabond2, I again recommend to specify instruments separately for the level and differenced model, and to explicitly specify lag orders. The default settings can be very confusing. In your example, I do not even understand what is going on with xtabond2. The command appears to create the first lag of the first-differenced instruments (which in your case are therefore lagged twice, because you specified the lag operator in the variable list) for the level model. This is inconsistent with the help file. It should not lag those instruments when you specify lag(0 0); this appears to be another bug. Why it does not replicate xtdpdgmm is another mystery to me. If you explicitly specify the instruments separately for each equation, you should be able to replicate the results.

The ivreg2 standard errors do not apply the Windmeijer correction.
Leave a comment:

Arkangel Cordero replied

19 Jan 2024, 18:04

Dear Professor @Sebastian Kripfganz

Understood! Thank you.

I have a question regarding your comment that

As an aside, note that iv(L(w k)) for the level model makes the strong assumption that w and k are uncorrelated with the unobserved group-specific effects (akin to a random-effects assumption), which may not be desired.

I understand your point. Assuming that L(w k) are exogenous, would the following xtabond2 specification make more sense if we want to have instruments for L(w k) in both equations? gmm(L(w k), lag(0 0)) Or would it have to be gmm(L(w k), lag(1 1))?

Relatedly, can you please provide some guidance as to why I can't seem to reproduce the xtabond2 model below with xtdpdgmm.

Code:

webuse abdata

xtabond2 L(0/1).n L(w k) (yr1978 - yr1984),  iv(yr1978 - yr1984, eq(level)) ///
gmm(L(n), lag(2 3)) ///
gmm(L(w k), lag(0 0))  ///
two cluster(id)
estimate store m1


xtdpdgmm L(0/1).n L(w k), teffects model(level) ///
gmm(L(n), lag(2 3) model(difference)) gmm(L(n), lag(1 1) difference model(level)) ///
gmm(L(w k),   lag(0 0) model(difference)) gmm(L(w k),   lag(1 1) difference model(level)) ///
two vce(cluster id) 
estimate store m2

quietly predict iv*, iv

ivreg2 n  (L1.(n w k) yr1978 - yr1984 = iv*),     gmm2s  cluster(id) nocollin
estimate store m3

esttab m1 m2 m3, b(2) se(3)


-----------------------------------------------------------
                      (1)             (2)             (3)   
                        n               n               n   
------------------------------------------------------------
L.n                  0.88***         0.97***         0.97***
                  (0.060)         (0.052)         (0.024)   

L.w                  0.04           -0.20           -0.20***
                  (0.078)         (0.107)         (0.049)   

L.k                  0.10*           0.05            0.05** 
                  (0.046)         (0.044)         (0.017)   

yr1978              -0.01                           -0.03** 
                  (0.015)                         (0.012)   

yr1979              -0.01                           -0.05***
                  (0.017)                         (0.014)   

yr1980              -0.06**                         -0.09***
                  (0.019)                         (0.014)   

yr1981              -0.15***                        -0.17***
                  (0.022)                         (0.016)   

yr1982              -0.14***                        -0.15***
                  (0.020)                         (0.014)   

yr1983              -0.10***                        -0.10***
                  (0.026)                         (0.016)   

yr1984              -0.09*                          -0.07***
                  (0.034)                         (0.019)   

1978.year                           -0.03                   
                                  (0.019)                   

1979.year                           -0.05*                  
                                  (0.024)                   

1980.year                           -0.09***                
                                  (0.024)                   

1981.year                           -0.17***                
                                  (0.027)                   

1982.year                           -0.15***                
                                  (0.023)                   

1983.year                           -0.10***                
                                  (0.026)                   

1984.year                           -0.07*                  
                                  (0.030)                   

_cons                0.07            0.73*           0.73***
                  (0.263)         (0.347)         (0.158)   
------------------------------------------------------------
N                     891             891             891   
------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Finally, and I am sure this is a dumb question on my part, why are the standard errors so different between xtdpdgmm and ivreg2? Is there a way to adjust them in ivreg2 to make them match xtdpdgmm?

Leave a comment:

Sebastian Kripfganz replied

19 Jan 2024, 03:21
Originally posted by Arkangel Cordero View Post

However, for the purposes of reproducing the results of xtabond2 (point estimates and their standard errors) with xtdpdgmm for the system gmm, I could only do so by including iv(yr1978 - yr1984) and iv(L(w k)) in both equations for xtdpdgmm?

I am not sure what you mean by that. The following codes without time dummies as instruments for the differenced model still yield the same results, and similarly if I drop corresponding instruments for w and k:

Code:

xtabond2 L(0/1).n L(w k) (yr1978 - yr1984), /// gmm(L(n), lag(2 3)) iv(yr1978 - yr1984, eq(level)) iv(L(w k), eq(diff)) iv(L(w k), eq(level)) two cluster(id) xtdpdgmm L(0/1).n L(w k), model(level) gmm(L(n), lag(2 3) model(difference)) /// gmm(L(n), lag(1 1) difference model(level)) iv(L(w k)) iv(L(w k), difference model(difference)) two vce(cluster id) teffects

As an aside, note that iv(L(w k)) for the level model makes the strong assumption that w and k are uncorrelated with the unobserved group-specific effects (akin to a random-effects assumption), which may not be desired.
Leave a comment:

Arkangel Cordero replied

18 Jan 2024, 19:45

Dear Professor @Sebastian Kripfganz

Thank you much for your guidance and thorough explanation. Following your response, I was able to reproduce the results of xtabond2 with xtdpdgmm and iverg2 for the system gmm estimator.

I know that you have explained about the redundancy of using the time-dummies both in the level and first-difference equations. However, I understand that both xtabond2 and xtdpdgmm take care of this redundancy by dropping the colinear instruments. I am also aware that this may (or at least used to) lead xtabond2 reporting the wrong degrees of freedom and p-values for some diagnostics regarding the instruments. However, for the purposes of reproducing the results of xtabond2 (point estimates and their standard errors) with xtdpdgmm for the system gmm, I could only do so by including iv(yr1978 - yr1984) and iv(L(w k)) in both equations for xtdpdgmm? Thank you in advance for your guidance.

Code:

webuse abdata

xtabond2 L(0/1).n L(w k) (yr1978 - yr1984), ///
gmm(L(n), lag(2 3)) iv(yr1978 - yr1984, eq(diff)) iv(yr1978 - yr1984, eq(level)) iv(L(w k), eq(diff)) iv(L(w k), eq(level))  two cluster(id)
estimate store m1

xtdpdgmm L(0/1).n L(w k), model(level) gmm(L(n), lag(2 3) model(difference)) ///
gmm(L(n), lag(1 1) difference model(level)) iv(L(w k)) iv(yr1978 - yr1984, difference model(difference)) iv(L(w k), difference model(difference)) two vce(cluster id) teffects 
estimate store m2

esttab m1 m2

xtdpdgmm L(0/1).n L(w k) yr1978 - yr1984, model(level) gmm(L(n), lag(2 3) model(difference)) ///
gmm(L(n), lag(1 1) difference model(level)) iv(yr1978 - yr1984) iv(L(w k)) iv(yr1978 - yr1984, difference model(difference)) iv(L(w k), difference model(difference)) two vce(cluster id) 
estimate store m3

quietly predict iv*, iv

ivreg2 n  (L1.(n w k) yr1978 - yr1984 = iv*),     gmm2s  cluster(id) nocollin
estimate store m4

esttab m1 m2 m3 m4


----------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)   
                        n               n               n               n   
----------------------------------------------------------------------------
L.n                 0.968***        0.968***        0.968***        0.968***
                  (11.09)         (11.09)         (11.09)         (30.63)   

L.w               -0.0610         -0.0610         -0.0610         -0.0610** 
                  (-1.56)         (-1.56)         (-1.56)         (-2.85)   

L.k                0.0254          0.0254          0.0254          0.0254   
                   (0.34)          (0.34)          (0.34)          (0.98)   

yr1978            -0.0139                         -0.0139         -0.0139   
                  (-0.83)                         (-0.83)         (-1.06)   

yr1979            -0.0172                         -0.0172         -0.0172   
                  (-0.83)                         (-0.83)         (-1.15)   

yr1980            -0.0650***                      -0.0650***      -0.0650***
                  (-3.44)                         (-3.44)         (-4.46)   

yr1981             -0.155***                       -0.155***       -0.155***
                  (-6.15)                         (-6.15)         (-8.51)   

yr1982             -0.131***                       -0.131***       -0.131***
                  (-5.10)                         (-5.10)         (-8.44)   

yr1983            -0.0795*                        -0.0795*        -0.0795***
                  (-2.54)                         (-2.54)         (-4.54)   

yr1984            -0.0579                         -0.0579         -0.0579** 
                  (-1.61)                         (-1.61)         (-3.23)   

1978.year                         -0.0139                                   
                                  (-0.83)                                   

1979.year                         -0.0172                                   
                                  (-0.83)                                   

1980.year                         -0.0650***                                
                                  (-3.44)                                   

1981.year                          -0.155***                                
                                  (-6.15)                                   

1982.year                          -0.131***                                
                                  (-5.10)                                   

1983.year                         -0.0795*                                  
                                  (-2.54)                                   

1984.year                         -0.0579                                   
                                  (-1.61)                                   

_cons               0.262           0.262           0.262           0.262*  
                   (1.09)          (1.09)          (1.09)          (2.54)   
----------------------------------------------------------------------------
N                     891             891             891             891   
----------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Leave a comment:

Sebastian Kripfganz replied

18 Jan 2024, 04:09
This is a specification of xtabond2 that cannot be replicated with either xtdpdgmm or ivreg2. The iv(L(w k)) option of xtabond2 without eq() suboption is not equivalent to the joint specification of iv(L(w k), eq(diff)) iv(L(w k), eq(level)). The same applies to iv(yr1978 - yr1984). Effectively, xtabond2 uses a moment condition which is the sum of a moment condition for the model in levels and a moment condition for the model in first differences. Technically, this sum is a valid moment condition if both moment conditions are individually valid. However, intuitively it does not make much sense. It is not compatible with the way xtdpdgmm works. The latter transforms all moment conditions into moment conditions for the level model with appropriately transformed instruments; see slide 33 of my 2019 London Stata Conference presentation. This is not possible with this summed moment condition utilized by xtabond2. In doubt, always explicitly specify the eq() suboption.

The reason why ivreg2 does not replicate xtdpdgmm lies in the choice of the weighting matrix. Replication can only be achieved with the xtdpdgmm default weighting matrix w(unadjusted).

Actually, having said that, there is a slightly complicated way of replicating xtdpdgmm with ivreg2 when a different weighting matrix is used:

Code:

webuse abdata xtdpdgmm L(0/1).n L.(w k) yr1978 - yr1984, model(level) gmm(L.(n), lag(2 3) model(difference)) gmm(L.(n), lag(1 1) difference model(level)) iv(yr1978 - yr1984) iv(L.(w k)) w(ind) two vce(cluster id) predict iv*, iv matrix W = e(W) matrix coleq W = "" matrix roweq W = "" mat colnames W = `r(iv)' _cons mat rownames W = `r(iv)' _cons ivreg2 n (L1.(n w k) yr1978 - yr1984 = iv*), cluster(id) nocollin wmatrix(W)

Notice that I have removed the gmm2s option from ivreg2 because it now already starts with the optimal weighting matrix obtained from xtdpdgmm. (Alternatively, you could run xtdpdgmm with the onestep option, feed the respective weighting matrix into ivreg2 and use the gmm2s option again.)
Leave a comment:

Arkangel Cordero replied

17 Jan 2024, 19:02

Dear Professor @Sebastian Kripfganz

I have a follow-up question, but this time for the system gmm estimator. In what follows, I assume that L(wk) are exogenous, and therefore can instrument for themselves. I am unable to 1) reproduce the results from xtabond2 with xtdpdgmm and 2) reproduce the results of xtdpdgmm with ivreg2. I would appreciate any guidance.

I tried to reproduce these being as careful as possible and apologize in advance for my ignorance.

Code:

webuse abdata

xtabond2 L(0/1).n L(w k) (yr1978 - yr1984), ///
gmm(L(n), lag(2 3)) iv(yr1978 - yr1984) iv(L(w k)) h(2) two cluster(id)
estimate store m1

xtdpdgmm L(0/1).n L(w k), model(level) gmm(L(n), lag(2 3) model(difference)) ///
gmm(L(n), lag(1 1) difference model(level)) iv(L(w k)) w(ind) two vce(cluster id) teffects
estimate store m2

xtdpdgmm L(0/1).n L(w k) yr1978 - yr1984, model(level) gmm(L(n), lag(2 3) model(difference)) ///
gmm(L(n), lag(1 1) difference model(level)) iv(yr1978 - yr1984) iv(L(w k)) w(ind) two vce(cluster id) 
estimate store m3

quietly predict iv*, iv

ivreg2 n  (L1.(n w k) yr1978 - yr1984 = iv*),     gmm2s  cluster(id) nocollin
estimate store m4

esttab m1 m2 m3 m4

----------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)   
                        n               n               n               n   
----------------------------------------------------------------------------
L.n                 0.904***        0.910***        0.910***        0.896***
                  (13.29)         (12.87)         (12.87)         (23.18)   

L.w               -0.0483         -0.0711*        -0.0711*        -0.0764***
                  (-0.97)         (-2.04)         (-2.04)         (-3.66)   

L.k                0.0791          0.0759          0.0759          0.0885** 
                   (1.34)          (1.25)          (1.25)          (2.80)   

yr1978           -0.00860                        -0.00552        -0.00437   
                  (-0.55)                         (-0.35)         (-0.33)   

yr1979            -0.0187                         -0.0175         -0.0152   
                  (-1.00)                         (-0.95)         (-0.99)   

yr1980            -0.0608**                       -0.0606**       -0.0566***
                  (-3.24)                         (-3.25)         (-3.66)   

yr1981             -0.146***                       -0.144***       -0.140***
                  (-5.49)                         (-5.45)         (-6.44)   

yr1982             -0.145***                       -0.146***       -0.143***
                  (-6.24)                         (-6.65)         (-8.44)   

yr1983            -0.0832***                      -0.0848***      -0.0850***
                  (-3.56)                         (-3.55)         (-4.44)   

yr1984            -0.0748*                        -0.0802*        -0.0812***
                  (-2.18)                         (-2.06)         (-3.51)   

1978.year                        -0.00552                                   
                                  (-0.35)                                   

1979.year                         -0.0175                                   
                                  (-0.95)                                   

1980.year                         -0.0606**                                 
                                  (-3.25)                                   

1981.year                          -0.144***                                
                                  (-5.45)                                   

1982.year                          -0.146***                                
                                  (-6.65)                                   

1983.year                         -0.0848***                                
                                  (-3.55)                                   

1984.year                         -0.0802*                                  
                                  (-2.06)                                   

_cons               0.313           0.377           0.377           0.411***
                   (1.30)          (1.88)          (1.88)          (3.66)   
----------------------------------------------------------------------------
N                     891             891             891             891   
----------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: