XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz started a topic XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

01 Jun 2017, 05:55
XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions
Dear Statalisters,

I have made a new estimation command available for installation from my website:

Code:

. net install xtdpdgmm, from(http://www.kripfganz.de/stata/)

xtdpdgmm estimates a linear (dynamic) panel data model with the generalized method of moments (GMM). The main value added of the new command is that is allows to combine the traditional linear moment conditions with the nonlinear moment conditions suggested by Ahn and Schmidt (1995) under the assumption of serially uncorrelated idiosyncratic errors. These additional nonlinear moment conditions can yield potentially sizeable efficiency gains and they also improve the finite-sample performance. Given that absence of serial correlation is usually a prerequisite also for other GMM estimators in the presence of a lagged dependent variable, the gains from the nonlinear moment conditions essentially come for free.

The extra moment conditions can help to overcome a weak instruments problem of the Arellano and Bond (1991) difference-GMM estimator when the autoregressive coefficient approaches unity. Furthermore, the Ahn and Schmidt (1995) estimator is also robust to deviations from mean stationarity, a situation that would invalidate the Blundell and Bond (1998) system-GMM approach.

Without these nonlinear moment conditions, xtdpdgmm replicates the results obtained with the familiar commands xtabond, xtdpd, xtdpdsys, and xtabond2, as well as my other recent command xtseqreg. Collapsing of GMM-type instruments and different initial weighting matrices are supported. The key option of xtdpdgmm that adds the nonlinear moment conditions is called noserial. For example:

Code:

. webuse abdata . xtdpdgmm L(0/1).n w k, noserial gmmiv(L.n, collapse model(difference)) iv(w k, difference model(difference)) twostep vce(robust) Generalized method of moments estimation Step 1 initial: f(p) = 6.9508498 alternative: f(p) = 1.917675 rescale: f(p) = .07590133 Iteration 0: f(p) = .07590133 Iteration 1: f(p) = .003352 Iteration 2: f(p) = .00274414 Iteration 3: f(p) = .00274388 Iteration 4: f(p) = .00274388 Step 2 Iteration 0: f(p) = .26774896 Iteration 1: f(p) = .20397319 Iteration 2: f(p) = .2011295 Iteration 3: f(p) = .20109259 Iteration 4: f(p) = .20109124 Iteration 5: f(p) = .2010912 Group variable: id Number of obs = 891 Time variable: year Number of groups = 140 Moment conditions: linear = 10 Obs per group: min = 6 nonlinear = 6 avg = 6.364286 total = 16 max = 8 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | WC-Robust n | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- n | L1. | .657292 .1381388 4.76 0.000 .3865449 .9280391 | w | -.7248798 .0996565 -7.27 0.000 -.9202029 -.5295568 k | .2399022 .0737048 3.25 0.001 .0954435 .3843609 _cons | 2.719216 .4015915 6.77 0.000 1.932111 3.506321 ------------------------------------------------------------------------------

The Gauss-Newton technique is used to minimize the GMM criterion function. With vce(robust), the Windmeijer (2005) finite-sample standard error correction is computed for estimators with and without nonlinear moment conditions.

For details about the syntax, the available options, and the supported postestimation commands, please see the help files:

Code:

. help xtdpdgmm . help xtdpdgmm postestimation

Available postestimation command include the Arellano-Bond test for absence of serial correlation in the first-differenced errors, estat serial, and the familiar Hansen J-test of the overidentifying restrictions, estat overid. The results of the Arellano-Bond test differ slightly from xtdpd and xtabond2 for two-step robust estimators because I account for the finite-sample Windmeijer (2005) correction when computing the test statistic, while the existing commands do not. estat overid can also be used to perform difference-in-Hansen tests but it requires that the two models are estimated separately. In that regard, the results differ from the difference-in-Hansen test statistics reported by xtabond2; see footnote 24 in Roodman (2009) for an explanation. An alternative to difference-in-Hansen tests is a generalized Hausman test, implemented in estat hausman for use after xtdpdgmm.

Finally, the results with and without nonlinear moment conditions can in principle also be obtained with Stata's official gmm command. However, it is anything but straightforward to do so. While the official gmm command offers lots of extra flexibility, it does not provide a tailored solution for this particular estimation problem. While xtdpdgmm can easily handle unbalanced panel data, gmm tends to have some problems in that case. In addition, gmm tends to be very slow in particular with large data sets. I did not do a sophisticated benchmark comparison, but for a single estimation on a data set with 40,000 observations, it took me 43 minutes (!) to obtain the results with gmm, while xtdpdgmm returned the identical results after just 4 seconds!

I hope you enjoy the new command. As always, comments and suggestions are highly welcome, and an appropriate reference would be very much appreciated if my command proves to be helpful for your own research.

References:
Ahn, S. C., and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Econometrics 68: 5-27.

Arellano, M., and S. R. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58: 277-297.

Blundell, R., and S. R. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Review of Economic Studies 87: 115-143.

Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal 9: 86-136.

Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step GMM estimators. Journal of Econometrics 126: 25-51.
Last edited by Sebastian Kripfganz; 01 Jun 2017, 06:15.
Tags: fixed effects, gmm, panel data

4 likes
Sebastian Kripfganz replied

18 Jul 2025, 07:03
I am afraid the new option introduced in post #689 was released prematurely. To properly compute (Windmeijer-corrected) standard errors and for some postestimation tests, knowing the first-step residuals is not enough. I have thus decided to replace the wmatrix(residuals) option by a new option called first(), which lets you specify the name of stored estimation results, so that xtdpdgmm can obtain all the information it needs. Please see the help file for details on the syntax of this new option.

The previous example now no longer works, but the following code serves the same purpose:

Code:

webuse abdata xtdpdgmm L(0/1).n w k, gmm(L.n w k, lag(1 4) collapse model(diff)) vce(robust) estimates store diffgmm xtdpdgmm L(0/1).n w k, gmm(L.n w k, lag(1 4) collapse model(diff)) iv(L.n w k, diff) first(diffgmm) vce(robust)

To update xtdpdgmm to version 2.7.2, as always type:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace
Last edited by Sebastian Kripfganz; 18 Jul 2025, 07:06.
Leave a comment:
Sebastian Kripfganz replied

14 Jul 2025, 10:04
Some teathing problems with the new wmatrix(residuals) option have been ironed out in version 2.7.1:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace
Last edited by Sebastian Kripfganz; 14 Jul 2025, 10:43.
Leave a comment:
Sebastian Kripfganz replied

04 Jul 2025, 09:22
The latest xtdpdgmm update to version 2.7.0 brings the following new functionality:

With option wmatrix(residuals varname), you can now specify residuals from an initial estimation to be used to compute the optimal weighting matrix for two-step estimation. Without this option, the optimal weighting matrix is computed from the one-step GMM estimator with the same set of instruments as the two-step estimator. The new option allows to use a different set of instruments for the initial estimator, or even a different estimator (from a different Stata command) altogether.

This can be useful in some cases when the two-step GMM estimator is a "system GMM" estimator with instruments for the level model. If the variance of the group-specific error component is large relative to the variance of the idiosyncratic error component, the optimal weighting matrix might be poorly estimated. To circumvent this problem, initial estimates could be obtained from a "difference GMM" estimator. Here is an example:

Code:

webuse abdata xtdpdgmm L(0/1).n w k, gmm(L.n w k, lag(1 4) collapse model(diff)) predict resid, ue xtdpdgmm L(0/1).n w k, gmm(L.n w k, lag(1 4) collapse model(diff)) iv(L.n w k, diff) wmatrix(residuals resid) twostep vce(robust)

This approach was suggested by Hugo Kruiniger in a recent article:
Kruiniger, H. (2022). Estimation of dynamic panel data models with a lot of heterogeneity. Econometric Reviews 41 (2), 117-146.

To update the command, type the following in Stata's command window:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace
Leave a comment:
Arkangel Cordero replied

07 Apr 2025, 17:24
Dear Professor @Sebastian Kripfganz,

Thank you for your always valuable insights!
Leave a comment:
Sebastian Kripfganz replied

06 Apr 2025, 03:41
1) Model 3 was implemented in terms of the level equation with serially uncorrelated idiosyncratic errors. This is another benefit of orthogonalizing the instruments instead of transforming the model itself: You can use all the conventional procedures, including conventional weighting matrices.

2) I have never really thought about an interpretation of the specific form of the orthogonalized instruments. There construction is merely mechanical; see slide 33 of my 2019 London Stata Conference presentation. I guess, interpretation (b) makes more sense.
Leave a comment:
Arkangel Cordero replied

04 Apr 2025, 11:00
Dear Professor @ Sebastian Kripfganz,

Thank you for your insights. I have two followup questions.

1) Really cool the way around tricking the gmm command into accounting for first-order serial correlation in the first-difference equation. My question is, why was this not necessary in Model 3 on my original post above in #684?

2) Also referring to Model 3 in #684 above, is the reason that when orthogonalizing the instruments relative to the unit fixed-effects we force the first observation for each panel to be missing to a) avoid the dummy variable trap or b) because our estimator is really a first-difference model and we are taking into account that we lose the first observation?

Thank you again!
Leave a comment:
Sebastian Kripfganz replied

04 Apr 2025, 08:17
Thank you for this well-designed replication example.

I will start with question 3. The reason why your final results differ from the previous ones is that the unadjusted initial weighting matrix does not account for the first-order serial correlation in the first-differenced equation. You would need to use winitial(xt D) for this purpose. However, the command only allows this option in combination with xtinstruments(). You can trick the gmm command to deliver the desired results by supplying an xt-instrument full of zeros:

Code:

gen zeros = 0 gmm (D.n - {xb: LD.n D.wage D.emp D.k D.yr1979 D.yr1980 D.yr1981 D.yr1982}), /// instruments(iyr* wage emp k yr1979 - yr1982, noconstant) /// xtinstruments(zeros, lags(0/0)) winitial(xt D) vce(cluster id) twostep

Regarding your other questions:
1.a. You could think about it this way, yes.
1.b/c. What you have done in your manual construction of the instruments is correct.
2. The rational behind this approach is that there is no need to estimate a "system" of equations, when you think about the system GMM estimator. Of course, the Arellano-Bond estimator only has one equation in first differences, but the approach taken by xtdpdgmm is that all transformations are special cases of the system approach, which can be recast as a conventional estimator for the equation in levels with appropriately orthogonalized instruments. This simplifies the command's architecture substantially and makes it straightforward to implement any other type of transformation (e.g., forward-orthogonal deviations). The sample size is larger because it refers to the level model, not the first-differenced one. It is true though that effectively still one observation is lost due to the orthogonalization. Consequently, it is a fair question whether this should be reflected in the reported number of observations.
Leave a comment:

Arkangel Cordero replied

03 Apr 2025, 18:12

Dear Professor @Sebastian Kripfganz

I hope you are well.

I have, what I hope is, a quick set of questions. I am playing with the following four models below (Models I through IV).

Code:

webuse abdata, clear

/* Balancing the panels for simplicty*/
keep if year>=1977&year<=1982
by id: keep if _N==6

/* Model I: xtdpdgmm NATIVE syntax for GMM-Style instruments */
xtdpdgmm L(0/1).n wage emp k yr1979 - yr1982, model(difference) gmm(L.(n), lag(1 2) model(difference)) iv(wage emp k yr1979 - yr1982, model(difference))  twostep nocons vce(cluster id)
estimate store m1_xtdpdgmm




/* Model II: xtdpdgmm with GMM-Style instruments calculated "by hand" */
* Generate GMM-Style instruments "by hand"
foreach var of varlist n {
    di "`var'"
forvalues lag = 2(1)3 {
 display `lag'
 
 capture drop il`lag'`var'
 
 gen il`lag'`var' = L`lag'.`var'
 
 replace il`lag'`var' = 0 if il`lag'`var' ==.
 

 foreach year of varlist yr1977- yr1982 {
            
            capture drop i`year'l`lag'`var'
            gen i`year'l`lag'`var' = `year' * il`lag'`var'
            replace i`year'l`lag'`var' = 0 if i`year'l`lag'`var'  == .
            *replace i`year'l`lag'`var' = . if year == 1977
}
}
}

findname, all(@==0)
drop `r(varlist)'

xtdpdgmm L(0/1).n wage emp k yr1979 - yr1982, model(difference) iv(iyr* wage emp k yr1979 - yr1982, model(difference))  twostep nocons vce(cluster id)
estimate store m2_xtdpdgmm




/* Model III: Stata's "gmm" command with the GMM-Style instruments calculated by hand ORTHOGONALIZED relative to the fixed-effects */
* Orthogonalize instruments calculated "by hand" relative to the fixed-effects
foreach var of varlist iy* wage emp k yr1979 - yr1982 {    
    capture drop  orth_`var'    
    gen orth_`var' = `var'
    bysort id (year): replace orth_`var' = 0 if _n == 2
    bysort id (year):  replace orth_`var'  = orth_`var' - F1.orth_`var' if _n != _N    
    bysort id (year): replace orth_`var'  = . if _n == 1    
}

gmm (eq1: n  - {n:  L.n wage emp k yr1979 yr1980 yr1981 yr1982}), ///
    instruments(orth_*, noconstant) ///
    winitial(unadjusted) vce(cluster id) twostep
estimate store m1_gmm


/* Model IV:  Stata's "gmm" command with the GMM-Style instruments calculated "by hand" but NOT orthogonalized with respect to the fixed-effects */
gmm (D.n - {xb: LD.n D.wage D.emp D.k D.yr1979 D.yr1980 D.yr1981 D.yr1982}), ///
    instruments(iyr* wage emp k yr1979 - yr1982, noconstant) ///
    winitial(unadjusted) vce(cluster id) twostep
estimate store m2_gmm

esttab  m1_xtdpdgmm m2_xtdpdgmm m1_gmm m2_gmm ,  b(7) se(7) order(L.n LD.n wage D.wage emp D.emp k D.k)

With results:

HTML Code:

----------------------------------------------------------------------------
                      (1)             (2)             (3)             (4)  
                        n               n                                  
----------------------------------------------------------------------------
main                                                                        
L.n            -0.1244813      -0.1244813      -0.1244813                  
              (0.3169631)     (0.3169631)     (0.2429797)                  

LD.n                                                           -0.1492370  
                                                              (0.2419743)  

wage           -0.0294276      -0.0294276      -0.0294276                  
              (0.0170682)     (0.0170682)     (0.0160661)                  

D.wage                                                         -0.0299815  
                                                              (0.0163513)  

emp             0.0144419       0.0144419       0.0144419*                  
              (0.0092611)     (0.0092611)     (0.0071777)                  

D.emp                                                           0.0142584*  
                                                              (0.0072503)  

k               1.0777604**     1.0777604**     1.0777604***                
              (0.3357940)     (0.3357940)     (0.2396408)                  

D.k                                                             1.1037843***
                                                              (0.2389169)  

yr1979         -0.0256686*     -0.0256686*     -0.0256686*                  
              (0.0121993)     (0.0121993)     (0.0123034)                  

yr1980         -0.0271234      -0.0271234      -0.0271234                  
              (0.0151974)     (0.0151974)     (0.0155296)                  

yr1981         -0.0024327      -0.0024327      -0.0024327                  
              (0.0345831)     (0.0345831)     (0.0299392)                  

yr1982          0.0534354       0.0534354       0.0534354                  
              (0.0579830)     (0.0579830)     (0.0523127)                  

D.yr1979                                                       -0.0260374*  
                                                              (0.0125335)  

D.yr1980                                                       -0.0267815  
                                                              (0.0158829)  

D.yr1981                                                       -0.0002548  
                                                              (0.0303609)  

D.yr1982                                                        0.0568577  
                                                              (0.0532091)  
----------------------------------------------------------------------------
N                     690             690             690             552  
----------------------------------------------------------------------------  
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

I have three sets of questions that I was hoping you could provide some guidance with. Please not that all pertain to the “classic” first-difference” gmm models a la Arellano & Bond (1991).

1) My first question is about the way that xtdpdgmm orthogonalizes the instruments with respect to the unit-level fixed-effects. Previously you mentioned that the key is that the sum of the instruments within panel be equal to “0”.

a. My way of understanding the above statement is that if the within-unit (i.e., panel) sum of an instrument is equal to “0”, then its mean will also be equal to “0”. Under such circumstances, the instrument become deviations from its within unit means (which is “0”), and therefore orthogonal to the unit fixed-effects. Is this interpretation accurate?

b. When orthogonalizing each instrument with respect to the unit-fixed effects, it seems that xtdpdgmm simply subtracts from each value the value at the next time period within each unit(panel). Is that correct?

c. If so, it appears that for each instrument:

*The value of an instrument for the last time-period within each unit is left intact because we don’t have anything to subtract from it. Is that correct?

*The value of an instrument for the first time-period within each unit is set to missing to avoid the “dummy-variable trap”. Is that accurate?

*The value of an instrument for the second time-period within each unit is set to “0” before subtracting the value for the subsequent time period. This is done because we have set the first period to missing and, hence, to ensure that the within unit sum is “0”. Is that correct?

2) Can you please provide some insight as to why xtdpdgmm chooses to orthogonalize the instruments rather than taking first-differences in the equation to be estimated? I noticed that the sample size is larger because of this. Is that part of the reason? I’m just curious.

3) Finally, as you can see, models I through III above, all estimate the same coefficients. Do you have any insights as to why the “un-orthogonalized” instruments produce different coefficient estimates in the last model (Model IV) when they produce the correct estimates with xtdpdgmm in Model II?

Thank you in advance for any insights.

Last edited by Arkangel Cordero; 03 Apr 2025, 19:07.

Announcement

XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: