XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#436

20 Jun 2022, 04:57

You would not normally run two separate regressions for the effects above and below the threshold. Just combine everything in a single regression:

Code:

xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep

https://www.kripfganz.de/stata/
1 like
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 62
#437

20 Jun 2022, 07:22

I tried following command
xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep However, it gives this error:
no observations
r(2000);

In this case, should I replace the missing values in the newly generated threshold variables with zero? As follows:

gen X2_h = X2 if X2 > 0.32
replace X2_h = 0 if X2_h == .

gen X2_l = X2 if X2 <= 0.32
replace X2_l = 0 if X2_l == .
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#438

20 Jun 2022, 07:30

Originally posted by Sarah Magd View Post

should I replace the missing values in the newly generated threshold variables with zero?

Yes

https://www.kripfganz.de/stata/
1 like
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 62
#439

21 Jun 2022, 11:52

I estimate the Cobb-Douglas production function in a static form as follows:
GDP per capita = Capital formation per capita + energy consumption per capita + inflation + trade openness + financial development
My sample is 13 years for 27 countries.
- I am using the fixed effect regression with robust standard errors and panel corrected standard errors with fixed effects. The two regressions give the expected results of my variable of interest (i.e., financial development). However, since the energy consumption variable is endogenous (i.e., due to the reverse causality), I should use a model that corrects the potential biases of this endogeneity. As I mentioned in #424, I can use the two-step GMM estimator to control for the endogeneity. Nevertheless, the financial development (my main variable) in this regression is insignificant/or counterintuitive.

- Given my sample size and the static specification, which estimator would be the most relevant to control for the endogeneity?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#440

21 Jun 2022, 12:51

A general answer is that lots of things can happen to your estimates when you change the underlying assumptions (i.e. one variable is treated as endogenous instead of exogenous). Instrumental variables estimators (including GMM) may help to alleviate the endogeneity problem, but they might create other problems. For example, standard errors might become quite large if instruments are relatively weak. Especially when you have a relatively small sample size, the differences between estimators might appear large because the coefficients are not estimated very precisely.

I would recommend to change the estimator as little as necessary when you make different assumption, to get the best possible comparison. Say, you start with a fixed-effects estimator:

Code:

xtreg Y X1 X2 X3, fe vce(robust)

Note that you can replicate this regression with xtdpdgmm as follows:

Code:

xtdpdgmm Y X1 X2 X3, model(mdev) iv(X1 X2 X3, norescale) small vce(robust)

Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:

Code:

xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)

Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.

https://www.kripfganz.de/stata/
1 like
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 62
#441

30 Jun 2022, 04:52

Dear Prof. Sebastian Kripfganz
- We have a static panel regression with relatively small T (i.e., T = 13 and cross-section units = 30), an endogenous variable (i.e., due to the reverse causality), and fixed effects.
The OLS fixed effects with robust standard errors is used first to obtain baseline results. As far as I understood, the two-step system GMM estimator can be used to control only for the endogeneity problem. Is there another statistical issue that is considered by the two-step system GMM estimator in the case of a static specification (i.e., more efficiency or consistency - omitted variable bias - serial autocorrelation - etc.)?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#442

30 Jun 2022, 05:13

An estimator is either consistent or not. The GMM estimator is consistent if all the moment conditions/instruments are valid (and there are sufficiently many instruments available to estimate all coefficients).

Efficiency is a relative concept. Among different GMM estimators, the asymptotically efficient estimator uses all non-redundant moment conditions/instruments and an optimal weighting matrix (as the two-step estimator does). If feasible, other estimators (such as a maximum likelihood estimator) might be more efficient in the sense that they achieve a smaller asymptotic variance.

Omitted variables are a source of endogeneity. If appropriate instruments are available (which are uncorrelated with the omitted variables), then GMM can deal with this problem.

Serial correlation may or may not be a problem. If all regressors are strictly exogenous, serial correlation can be accounted for by using an optimal weighting matrix and panel-robust standard errors. Sometimes, serial correlation can be an indication of omitted dynamics (which could be an omitted lagged dependent variable or omitted lags of the regressors). In that case, an omitted variables problem could arise.

https://www.kripfganz.de/stata/
1 like
Comment
Sarah Magd

Join Date: Feb 2022

Posts: 62
#443

30 Jun 2022, 06:05

Thanks a lot for the constructive and organized reply.

As far as I understood, for the case of static regression with fixed effects:
- The two-step system GMM estimator can control the endogeneity problem resulting from either a reverse causality or omitted variables bias (assuming that appropriate estimators are available).
- The two-step system GMM estimator is relatively more efficient than the one-step system GMM estimator because it accounts for the extra variance coming from the unobserved fixed effects
- The validity of the two-step GMM estimator is tested by the Hansen test. If it is insignificant, we can conclude that the results obtained by this estimator are consistent and the GMM can deal with the problem of omitted variable bias.
- Given the existence of endogenous regressors, the serial correlation would still affect the first admissible lag for the instruments. Therefore, for the Arellano-Bond test for autocorrelation of the first-differenced residuals, if H0: no autocorrelation of order 2 is accepted, then this can be an indication that there are no omitted dynamics nor omitted lags of the regressors.

Am I right?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#444

30 Jun 2022, 06:20

In general, this is correct.

https://www.kripfganz.de/stata/
1 like
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#445

03 Jul 2022, 07:01

The previous update of xtdpdgmm introduced doubly-corrected (DC) misspecification-robust standard errors for the one-step, two-step, and iterated GMM estimator with linear moment conditions. A new update is now available which also supports DC standard-errors - vce(robust, dc) - with these estimators for models with nonlinear moment conditions - nl(). With thanks to Kit Baum, this latest version 2.4.2 is now also available on SSC.

Code:

ado update xtdpdgmm, update

As a minor addition, the new version also supports use of the identity matrix as an initial weighting matrix - wmatrix(identity) - although use of this option would be hardly ever recommended in practice.

https://www.kripfganz.de/stata/
Comment

Sebastian Kripfganz

Join Date: May 2014
Posts: 2606

#446

18 Jul 2022, 11:03

Yet another update is now available:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/)

Version 2.5.0 of xtdpdgmm allows to estimate the model with the nonlinear moment conditions recently proposed by Chudik and Pesaran (2022). The respective command option is nl(predetermined). The name of this option reflects the fact that these nonlinear moment conditions are only valid if all of the right-hand side variables are predetermined (or strictly exogenous). Similar to the Ahn and Schmidt (1995) nonlinear moment conditions, a crucial assumption is that the idiosyncratic error term is serially uncorrelated. However, the Ahn-Schmidt moment conditions do not require the regressors to be predetermined. On the other side, the Chudik-Pesaran moment conditions relax some assumptions about the initial observations; see the Remarks section in the xtdpdgmm help file for more details.

In either case, the nonlinear moment conditions help with identification when the dependent variable is highly persistent. They become redundant when the additional Blundell and Bond (1998) instruments for the model in levels are added. In Monte-Carlo simulations, the Chudik-Pesaran estimator performs quite well.

This latest version also comes with the new option center, which centers the moments in the optimal weighting matrix around their mean. This is asymptotically irrelevant but might improve the finite-sample performance.

Here is an example of the Chudik-Pesaran estimator with centered weighting matrix:

Code:

. webuse abdata

. xtdpdgmm L(0/1).n w k, gmm(L.n w k, diff lag(1 4) collapse) model(diff) nl(predetermined) twostep center vce(robust)

Generalized method of moments estimation

Fitting full model:

Step 1:
initial:       f(b) =  6.9079499
alternative:   f(b) =  1.8818777
rescale:       f(b) =  .03794837
Iteration 0:   f(b) =  .03794837  
Iteration 1:   f(b) =  .00207619  
Iteration 2:   f(b) =  .00183829  
Iteration 3:   f(b) =  .00183771  
Iteration 4:   f(b) =  .00183766  

Step 2:
Iteration 0:   f(b) =   .9265277  
Iteration 1:   f(b) =  .72579345  
Iteration 2:   f(b) =  .58900414  
Iteration 3:   f(b) =  .53498361  
Iteration 4:   f(b) =  .53034607  
Iteration 5:   f(b) =  .51523719  
Iteration 6:   f(b) =  .50824095  
Iteration 7:   f(b) =  .50752705  
Iteration 8:   f(b) =  .50736446  
Iteration 9:   f(b) =  .50733335  
Iteration 10:  f(b) =  .50732691  
Iteration 11:  f(b) =  .50732563  
Iteration 12:  f(b) =  .50732537  
Iteration 13:  f(b) =  .50732532  

Group variable: id                           Number of obs         =       891
Time variable: year                          Number of groups      =       140

Moment conditions:     linear =      13      Obs per group:    min =         6
                    nonlinear =       6                        avg =  6.364286
                        total =      19                        max =         8

                                   (Std. err. adjusted for 140 clusters in id)
------------------------------------------------------------------------------
             |              WC-Robust
           n | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           n |
         L1. |   .4433475   .0635349     6.98   0.000     .3188213    .5678737
             |
           w |  -.7539702   .0810252    -9.31   0.000    -.9127767   -.5951636
           k |   .3452531   .0555308     6.22   0.000     .2364147    .4540914
       _cons |   3.103905   .2695798    11.51   0.000     2.575538    3.632272
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L1.D.L.n L2.D.L.n L3.D.L.n L4.D.L.n L1.D.w L2.D.w L3.D.w L4.D.w L1.D.k
   L2.D.k L3.D.k L4.D.k
 2, model(level):
   _cons

Notice that the estimator uses differenced gmm() instruments for the first-differenced model.

Ahn, S. C., and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Econometrics 68, 5-27.
Blundell, R., and S. R. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87, 115-143.
Chudik, A., and M. H. Pesaran (2022). An augmented Anderson-Hsiao estimator for dynamic short-T panels. Econometric Reviews 41, 416-447.

https://www.kripfganz.de/stata/

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#447

21 Jul 2022, 15:16

I am in update mood. Version 2.5.1 of xtdpdgmm comes with the following small improvements:
With the new suboption model(mean), instruments can now be specified for the model in within-group means. This is essentially the "between model" with averaged observations for each group. It might be useful for implementing a GMM version of the Hausman and Taylor (1981) estimator, as discussed by Arellano and Bover (1995).

A collinearity check has been added for the independent variables. In some cases, this circumvents non-convergence of the numerical optimization algorithm.

When the option nolevel is specified, groups with just a single observation (in levels) are now removed from the estimation sample. This affects the reported number of groups/observations and can have a small effect on standard errors and test statistics (but not coefficient estimates).

Arellano, M., and O. Bover (1995). Another look at the instrumental variable estimation of error-components models. Journal of Econometrics 68, 29-51.

Hausman, J. A., and W. E. Taylor (1981). Panel data and unobservable individual effects. Econometrica 49, 1377-1398.

https://www.kripfganz.de/stata/
Comment

Simon Rottler

Join Date: Jul 2022
Posts: 1

#448

25 Jul 2022, 02:08

Dear Sebastian Kripfganz

I updated xtdpdgmm to the newest version 2.5.1, however, I now cannot replicate models that I ran before the update and I don't know whether it's a bug or willingly implemented this way.
With my older version (must have been from May/June 2022), I was able to run

Code:

. xtdpdgmm log_co2emipercap L.log_co2emipercap log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity if year>=2000&year<=2017, gmm(L.log_co2emipercap, model(difference) collapse) iv(log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity, difference) nolevel nl(noser) small overid vce(robust)

After the update it says that nl() and nolevel cannot be combined:

Code:

options nl() and nolevel may not be combined
r(184);

Furthermore, I wanted to replicate your suggestion from above

Originally posted by Sebastian Kripfganz View Post

Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:

Code:

xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)

Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.

I get the following error

Code:

. xtdpdgmm log_co2emipercap log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity if year>=2000&year<=2017, model(mdev) gmm(log_gdppercap log_gdppercapsq, lag(2 8) collapse model(diff)) iv(nuclearshare hydroshare windshare solarshare othershare energyintensity, norescale) twostep vce(robust, dc) small
                 hash1():  3300  argument out of range
      asarray_contains():     -  function returned error
xtdpdgmm_init_ivvars_rescale():     -  function returned error
                 <istmt>:     -  function returned error

What did I miss?

Last edited by Simon Rottler; 25 Jul 2022, 02:22.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#449

25 Jul 2022, 02:32

Simon Rottler

The moment conditions created by option nl(noserial) are a function of the level errors. Hence, this option is not compatible with the nolevel option. You could consider it a bug in the command's previous version that xtdpdgmm did not prevent you from running your code nevertheless.

The error message you received in your second example is puzzling. It looks like a bug but I was not able to replicate it with other data sets. I have checked the program's code but do not see how this could have happened. If you are able and willing to send me your data set by e-mail, I might be able to find out what's wrong.

https://www.kripfganz.de/stata/
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2606
#450

28 Jul 2022, 04:40

This week's update of the xtdpdgmm package to version 2.6.0 brings a new command, xtdpdgmmfe, which serves as a wrapper for xtdpdgmm with simplified syntax. Instead of specifying all the instruments manually, this wrapper command does it for you based on a set of assumptions you input.
With option lags(), you can specify the autoregressive order of the model. By default, a dynamic model with 1 lag of the dependent variable is estimated.

With options exogenous(), predetermined(), and endogenous(), you need to classify the regressors accordingly.

Dummies for time effects can be added in the familiar way with option teffects.

With the familiar option collapse and the new option curtail(), you can easily reduce the number of instruments using collapsing or curtailing. The latter sets a maximum lag order for the instruments.

Option orthogonal allows you to request orthogonal deviations instead of first differences. (Note: For strictly exogenous variables, this will typically add instruments for the model in deviations from within-group means and for the model in forward-orthogonal deviations, while for predetermined and endogenous variables instruments are only available for the model in forward-orthogonal deviations. Also importantly, orthogonal automatically reduces the maximum lag order specified with option curtail() by 1 lag to ensure that the number of instruments stays the same with and without orthogonal deviations. The reason is that with orthogonal deviations, the minimum lag that is valid as an instrument is lower by 1 as well.)

With option serial(), you can allow for serially correlated idiosyncratic errors up to the specified order. This will affect the minimum lag order of instruments for predetermined and endogenous variables, and possibly the availability of nonlinear moment conditions. By default, serially uncorrelated idiosyncratic errors are assumed.

With option iid, you can add a homoskedasticity assumption in addition to serially uncorrelated idiosyncratic errors. This might enable additional linear or nonlinear moment conditions.

Option initdev is less intuitive. It assumes that the deviations of the initial observations from their long-run means are uncorrelated with the idiosyncratic errors. This relaxes the slightly stronger default assumption that initial observations and group-specific effects (not their deviations) must be each uncorrelated with the idiosyncratic errors. Under the default assumption, lagged levels can be used as instruments for the first-differenced/forward-orthogonally transformed model. Under the initdev assumption, only lagged first differences or backward-orthogonally transformed variables can be used as instruments. It also effects the type of nonlinear moment conditions that might be available.

With option stationary, additional first-differenced instruments become available for the level model. Nonlinear moment conditions become redundant.

If nonlinear moment conditions are undesired irrespective of the assumptions, option nonl can be specified.

In contrast to xtdpdgmm, the default estimator with xtdpdgmmfe is the iterated GMM estimator (igmm). Alternatively, the onestep, twostep, or continuously-updating GMM estimator (cugmm) can be requested.

By default, xtdpdgmmfe displays the respective xtdpdgmm command line used to estimate the model. This allows you to fine-tune your estimator using xtdpdgmm, which offers additional specialist options, and to see which options are implied by your chosen assumptions. If this feature is undesired, display of the command line can be prevented with option nocmdline.

Because the model is actually estimated by xtdpdgmm, the usual postestimation commands are available.

See the help file for details:

Code:

help xtdpdgmmfe

Here are some examples of conventional estimators, assuming that the regressors are predetermined. The examples also show how the xtdpdgmmfe syntax translates into the xtdpdgmm syntax:

Code:

. webuse abdata

1. Anderson and Hsiao (1981) "difference IV" estimators with lagged levels or lagged differences as instruments:

Code:

. xtdpdgmmfe n w k, predetermined(w k) collapse curtail(1) nonl teffects onestep xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) collapse curtail(1) teffects nolevel onestep . xtdpdgmmfe n w k, predetermined(w k) initdev collapse curtail(1) nonl teffects onestep xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .) difference) collapse curtail(1) teffects nolevel onestep

2. Arellano and Bond (1991) one-step "difference GMM" estimator with curtailed instruments:

Code:

. xtdpdgmmfe n w k, predetermined(w k) curtail(3) nonl teffects onestep xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) curtail(3) teffects nolevel onestep

3. Arellano and Bover (1995) one-step "forward-orthogonal GMM" estimator with curtailed instruments:

Code:

. xtdpdgmmfe n w k, predetermined(w k) curtail(3) orthogonal nonl teffects onestep xtdpdgmm L(0/1).n w k , model(fodev) gmmiv(L.n w k, lagrange(0 .)) curtail(2) teffects nolevel onestep

4. Hayakawa, Qi, and Breitung (2019) "backward/forward-orthogonal IV" estimator:

Code:

. xtdpdgmmfe n w k, predetermined(w k) initdev collapse curtail(1) orthogonal nonl teffects onestep xtdpdgmm L(0/1).n w k , model(fodev) gmmiv(L.n w k, lagrange(0 .) bodev) collapse curtail(0) teffects nolevel onestep

5. Blundell and Bond (1998) two-step "system GMM" estimator with curtailed/collapsed instruments and doubly-corrected robust standard errors:

Code:

. xtdpdgmmfe n w k, predetermined(w k) stationary collapse curtail(3) teffects twostep vce(robust, dc) xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) gmmiv(L.n w k, lagrange(0 0) difference model(level)) collapse curtail(3) teffects twostep vce(robust, dc)

6. Ahn and Schmidt (1995) two-step GMM estimator (with curtailed instruments and doubly-corrected robust standard errors) using nonlinear moment conditions valid under serially uncorrelated idiosyncratic errors without or with homoskedasticity:

Code:

. xtdpdgmmfe n w k, predetermined(w k) curtail(3) teffects twostep vce(robust, dc) xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) nl(noserial) curtail(3) teffects twostep vce(robust, dc) . xtdpdgmmfe n w k, predetermined(w k) iid curtail(3) teffects twostep vce(robust, dc) xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .)) nl(iid) curtail(3) teffects twostep vce(robust, dc)

7. Chudik and Pesaran (2022) iterated GMM estimator (with collapsed instruments, centered weighting matrix, and doubly-corrected robust standard errors) using nonlinear moment conditions valid under serially uncorrelated idiosyncratic errors (and no endogenous regressors):

Code:

. xtdpdgmmfe n w k, predetermined(w k) initdev collapse teffects igmm vce(robust, dc) center xtdpdgmm L(0/1).n w k , model(difference) gmmiv(L.n w k, lagrange(1 .) difference) nl(predetermined) collapse teffects nolevel igmm vce(robust, dc) center

8. Finally, a replication of a fixed-effects estimator in a static model (necessarily with strictly exogenous regressors):

Code:

. xtdpdgmmfe n w k, lags(0) exogenous(w k) collapse curtail(1) orthogonal teffects onestep norescale xtdpdgmm n w k , model(mdev) gmmiv(w k, lagrange(0 0)) collapse curtail(0) teffects nolevel onestep norescale . xtreg n w k i.year, fe

To install the latest version of the package, type the following in Stata's command window:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

Suggested citation if you find this package useful in your work:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

References:
Ahn, S. C., and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Econometrics 68, 5-27.

Anderson, T. W., and C. Hsiao (1981). Estimation of dynamic models with error components. Journal of the American Statistical Association 76, 598-606.

Arellano, M., and S. R. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58, 277-297.

Arellano, M., and O. Bover (1995). Another look at the instrumental variable estimation of error-components models. Journal of Econometrics 68, 29-51.

Blundell, R., and S. R. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87, 115-143.

Chudik, A., and M. H. Pesaran (2022). An augmented Anderson-Hsiao estimator for dynamic short-T panels. Econometric Reviews 41, 416-447.

Hayakawa, K., M. Qi, and J. Breitung (2019). Double filter instrumental variable estimation of panel data models with weakly exogenous variables. Econometric Reviews 38, 1055-1088.

https://www.kripfganz.de/stata/
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment