XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz replied

25 Jul 2022, 02:32
Simon Rottler

The moment conditions created by option nl(noserial) are a function of the level errors. Hence, this option is not compatible with the nolevel option. You could consider it a bug in the command's previous version that xtdpdgmm did not prevent you from running your code nevertheless.

The error message you received in your second example is puzzling. It looks like a bug but I was not able to replicate it with other data sets. I have checked the program's code but do not see how this could have happened. If you are able and willing to send me your data set by e-mail, I might be able to find out what's wrong.
Leave a comment:

Simon Rottler replied

25 Jul 2022, 02:08

Dear Sebastian Kripfganz

I updated xtdpdgmm to the newest version 2.5.1, however, I now cannot replicate models that I ran before the update and I don't know whether it's a bug or willingly implemented this way.
With my older version (must have been from May/June 2022), I was able to run

Code:

. xtdpdgmm log_co2emipercap L.log_co2emipercap log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity if year>=2000&year<=2017, gmm(L.log_co2emipercap, model(difference) collapse) iv(log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity, difference) nolevel nl(noser) small overid vce(robust)

After the update it says that nl() and nolevel cannot be combined:

Code:

options nl() and nolevel may not be combined
r(184);

Furthermore, I wanted to replicate your suggestion from above

Originally posted by Sebastian Kripfganz View Post

Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:

Code:

xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)

Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.

I get the following error

Code:

. xtdpdgmm log_co2emipercap log_gdppercap log_gdppercapsq nuclearshare hydroshare windshare solarshare othershare energyintensity if year>=2000&year<=2017, model(mdev) gmm(log_gdppercap log_gdppercapsq, lag(2 8) collapse model(diff)) iv(nuclearshare hydroshare windshare solarshare othershare energyintensity, norescale) twostep vce(robust, dc) small
                 hash1():  3300  argument out of range
      asarray_contains():     -  function returned error
xtdpdgmm_init_ivvars_rescale():     -  function returned error
                 <istmt>:     -  function returned error

What did I miss?

Last edited by Simon Rottler; 25 Jul 2022, 02:22.

Leave a comment:

Sebastian Kripfganz replied

21 Jul 2022, 15:16
I am in update mood. Version 2.5.1 of xtdpdgmm comes with the following small improvements:
With the new suboption model(mean), instruments can now be specified for the model in within-group means. This is essentially the "between model" with averaged observations for each group. It might be useful for implementing a GMM version of the Hausman and Taylor (1981) estimator, as discussed by Arellano and Bover (1995).

A collinearity check has been added for the independent variables. In some cases, this circumvents non-convergence of the numerical optimization algorithm.

When the option nolevel is specified, groups with just a single observation (in levels) are now removed from the estimation sample. This affects the reported number of groups/observations and can have a small effect on standard errors and test statistics (but not coefficient estimates).

Arellano, M., and O. Bover (1995). Another look at the instrumental variable estimation of error-components models. Journal of Econometrics 68, 29-51.

Hausman, J. A., and W. E. Taylor (1981). Panel data and unobservable individual effects. Econometrica 49, 1377-1398.
Leave a comment:

Sebastian Kripfganz replied

18 Jul 2022, 11:03

Yet another update is now available:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/)

Version 2.5.0 of xtdpdgmm allows to estimate the model with the nonlinear moment conditions recently proposed by Chudik and Pesaran (2022). The respective command option is nl(predetermined). The name of this option reflects the fact that these nonlinear moment conditions are only valid if all of the right-hand side variables are predetermined (or strictly exogenous). Similar to the Ahn and Schmidt (1995) nonlinear moment conditions, a crucial assumption is that the idiosyncratic error term is serially uncorrelated. However, the Ahn-Schmidt moment conditions do not require the regressors to be predetermined. On the other side, the Chudik-Pesaran moment conditions relax some assumptions about the initial observations; see the Remarks section in the xtdpdgmm help file for more details.

In either case, the nonlinear moment conditions help with identification when the dependent variable is highly persistent. They become redundant when the additional Blundell and Bond (1998) instruments for the model in levels are added. In Monte-Carlo simulations, the Chudik-Pesaran estimator performs quite well.

This latest version also comes with the new option center, which centers the moments in the optimal weighting matrix around their mean. This is asymptotically irrelevant but might improve the finite-sample performance.

Here is an example of the Chudik-Pesaran estimator with centered weighting matrix:

Code:

. webuse abdata

. xtdpdgmm L(0/1).n w k, gmm(L.n w k, diff lag(1 4) collapse) model(diff) nl(predetermined) twostep center vce(robust)

Generalized method of moments estimation

Fitting full model:

Step 1:
initial:       f(b) =  6.9079499
alternative:   f(b) =  1.8818777
rescale:       f(b) =  .03794837
Iteration 0:   f(b) =  .03794837  
Iteration 1:   f(b) =  .00207619  
Iteration 2:   f(b) =  .00183829  
Iteration 3:   f(b) =  .00183771  
Iteration 4:   f(b) =  .00183766  

Step 2:
Iteration 0:   f(b) =   .9265277  
Iteration 1:   f(b) =  .72579345  
Iteration 2:   f(b) =  .58900414  
Iteration 3:   f(b) =  .53498361  
Iteration 4:   f(b) =  .53034607  
Iteration 5:   f(b) =  .51523719  
Iteration 6:   f(b) =  .50824095  
Iteration 7:   f(b) =  .50752705  
Iteration 8:   f(b) =  .50736446  
Iteration 9:   f(b) =  .50733335  
Iteration 10:  f(b) =  .50732691  
Iteration 11:  f(b) =  .50732563  
Iteration 12:  f(b) =  .50732537  
Iteration 13:  f(b) =  .50732532  

Group variable: id                           Number of obs         =       891
Time variable: year                          Number of groups      =       140

Moment conditions:     linear =      13      Obs per group:    min =         6
                    nonlinear =       6                        avg =  6.364286
                        total =      19                        max =         8

                                   (Std. err. adjusted for 140 clusters in id)
------------------------------------------------------------------------------
             |              WC-Robust
           n | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
           n |
         L1. |   .4433475   .0635349     6.98   0.000     .3188213    .5678737
             |
           w |  -.7539702   .0810252    -9.31   0.000    -.9127767   -.5951636
           k |   .3452531   .0555308     6.22   0.000     .2364147    .4540914
       _cons |   3.103905   .2695798    11.51   0.000     2.575538    3.632272
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L1.D.L.n L2.D.L.n L3.D.L.n L4.D.L.n L1.D.w L2.D.w L3.D.w L4.D.w L1.D.k
   L2.D.k L3.D.k L4.D.k
 2, model(level):
   _cons

Notice that the estimator uses differenced gmm() instruments for the first-differenced model.

Ahn, S. C., and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Econometrics 68, 5-27.
Blundell, R., and S. R. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87, 115-143.
Chudik, A., and M. H. Pesaran (2022). An augmented Anderson-Hsiao estimator for dynamic short-T panels. Econometric Reviews 41, 416-447.

Leave a comment:

Sebastian Kripfganz replied

03 Jul 2022, 07:01
The previous update of xtdpdgmm introduced doubly-corrected (DC) misspecification-robust standard errors for the one-step, two-step, and iterated GMM estimator with linear moment conditions. A new update is now available which also supports DC standard-errors - vce(robust, dc) - with these estimators for models with nonlinear moment conditions - nl(). With thanks to Kit Baum, this latest version 2.4.2 is now also available on SSC.

Code:

ado update xtdpdgmm, update

As a minor addition, the new version also supports use of the identity matrix as an initial weighting matrix - wmatrix(identity) - although use of this option would be hardly ever recommended in practice.
Leave a comment:
Sebastian Kripfganz replied

30 Jun 2022, 06:20
In general, this is correct.
1 like
Leave a comment:
Sarah Magd replied

30 Jun 2022, 06:05
Thanks a lot for the constructive and organized reply.

As far as I understood, for the case of static regression with fixed effects:
- The two-step system GMM estimator can control the endogeneity problem resulting from either a reverse causality or omitted variables bias (assuming that appropriate estimators are available).
- The two-step system GMM estimator is relatively more efficient than the one-step system GMM estimator because it accounts for the extra variance coming from the unobserved fixed effects
- The validity of the two-step GMM estimator is tested by the Hansen test. If it is insignificant, we can conclude that the results obtained by this estimator are consistent and the GMM can deal with the problem of omitted variable bias.
- Given the existence of endogenous regressors, the serial correlation would still affect the first admissible lag for the instruments. Therefore, for the Arellano-Bond test for autocorrelation of the first-differenced residuals, if H0: no autocorrelation of order 2 is accepted, then this can be an indication that there are no omitted dynamics nor omitted lags of the regressors.

Am I right?
Leave a comment:
Sebastian Kripfganz replied

30 Jun 2022, 05:13
An estimator is either consistent or not. The GMM estimator is consistent if all the moment conditions/instruments are valid (and there are sufficiently many instruments available to estimate all coefficients).

Efficiency is a relative concept. Among different GMM estimators, the asymptotically efficient estimator uses all non-redundant moment conditions/instruments and an optimal weighting matrix (as the two-step estimator does). If feasible, other estimators (such as a maximum likelihood estimator) might be more efficient in the sense that they achieve a smaller asymptotic variance.

Omitted variables are a source of endogeneity. If appropriate instruments are available (which are uncorrelated with the omitted variables), then GMM can deal with this problem.

Serial correlation may or may not be a problem. If all regressors are strictly exogenous, serial correlation can be accounted for by using an optimal weighting matrix and panel-robust standard errors. Sometimes, serial correlation can be an indication of omitted dynamics (which could be an omitted lagged dependent variable or omitted lags of the regressors). In that case, an omitted variables problem could arise.
1 like
Leave a comment:
Sarah Magd replied

30 Jun 2022, 04:52
Dear Prof. Sebastian Kripfganz
- We have a static panel regression with relatively small T (i.e., T = 13 and cross-section units = 30), an endogenous variable (i.e., due to the reverse causality), and fixed effects.
The OLS fixed effects with robust standard errors is used first to obtain baseline results. As far as I understood, the two-step system GMM estimator can be used to control only for the endogeneity problem. Is there another statistical issue that is considered by the two-step system GMM estimator in the case of a static specification (i.e., more efficiency or consistency - omitted variable bias - serial autocorrelation - etc.)?
Leave a comment:
Sebastian Kripfganz replied

21 Jun 2022, 12:51
A general answer is that lots of things can happen to your estimates when you change the underlying assumptions (i.e. one variable is treated as endogenous instead of exogenous). Instrumental variables estimators (including GMM) may help to alleviate the endogeneity problem, but they might create other problems. For example, standard errors might become quite large if instruments are relatively weak. Especially when you have a relatively small sample size, the differences between estimators might appear large because the coefficients are not estimated very precisely.

I would recommend to change the estimator as little as necessary when you make different assumption, to get the best possible comparison. Say, you start with a fixed-effects estimator:

Code:

xtreg Y X1 X2 X3, fe vce(robust)

Note that you can replicate this regression with xtdpdgmm as follows:

Code:

xtdpdgmm Y X1 X2 X3, model(mdev) iv(X1 X2 X3, norescale) small vce(robust)

Then you assume that X1 is endogenous and you want to instrument it in the typical GMM style:

Code:

xtdpdgmm Y X1 X2 X3, model(mdev) iv(X2 X3, norescale) gmm(X1, lag(2 8) collapse model(diff)) twostep small vce(robust, dc)

Notice that I have left the instruments for X2 X3 in the same format as for the traditional fixed-effects regression. This way, you can best compare the results.
1 like
Leave a comment:
Sarah Magd replied

21 Jun 2022, 11:52
I estimate the Cobb-Douglas production function in a static form as follows:
GDP per capita = Capital formation per capita + energy consumption per capita + inflation + trade openness + financial development
My sample is 13 years for 27 countries.
- I am using the fixed effect regression with robust standard errors and panel corrected standard errors with fixed effects. The two regressions give the expected results of my variable of interest (i.e., financial development). However, since the energy consumption variable is endogenous (i.e., due to the reverse causality), I should use a model that corrects the potential biases of this endogeneity. As I mentioned in #424, I can use the two-step GMM estimator to control for the endogeneity. Nevertheless, the financial development (my main variable) in this regression is insignificant/or counterintuitive.

- Given my sample size and the static specification, which estimator would be the most relevant to control for the endogeneity?
Leave a comment:
Sebastian Kripfganz replied

20 Jun 2022, 07:30
Originally posted by Sarah Magd View Post

should I replace the missing values in the newly generated threshold variables with zero?

Yes
1 like
Leave a comment:
Sarah Magd replied

20 Jun 2022, 07:22
I tried following command
xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep However, it gives this error:
no observations
r(2000);

In this case, should I replace the missing values in the newly generated threshold variables with zero? As follows:

gen X2_h = X2 if X2 > 0.32
replace X2_h = 0 if X2_h == .

gen X2_l = X2 if X2 <= 0.32
replace X2_l = 0 if X2_l == .
Leave a comment:
Sebastian Kripfganz replied

20 Jun 2022, 04:57
You would not normally run two separate regressions for the effects above and below the threshold. Just combine everything in a single regression:

Code:

xtdpdgmm L(0/1).Y X1*X2_h X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep
1 like
Leave a comment:
Sarah Magd replied

20 Jun 2022, 04:11
#################################################
#Threshold dynamic panel model using xtdpdgmm
#################################################

Dear Prof. Kripfganz,

I want to estimate a threshold dynamic panel model using the xtdpdgmm. I have estimated the threshold value with another command. My problem is how to estimate the model with xtdpdgmm.
Suppose that X1 is the variable of interest and it is predetermined and X2 is the threshold variable with a threshold value = 0.32. X3 and X4 are control variables (endogenous). Would the following specification be right?

gen X2_h = X2 if X2 > 0.32
gen X2_l = X2 if X2 <= 0.32

xtdpdgmm L(0/1).Y X1*X2_h X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_h, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_h, lag(0 0) diff model (level)) vce(r, dc) overid twostep
xtdpdgmm L(0/1).Y X1*X2_l X3 X4, model(diff) collapse gmm(Y X3 X4, lag(2 4)) gmm(X1*X2_l, lag(1 7)) gmm(Y X3 X4, lag(1 1) diff model(level)) gmm(X1*X2_l, lag(0 0) diff model (level)) vce(r, dc) overid twostep

If the code is not correctly specified for a threshold dynamic panel model, it would be highly appreciated if you could guide us on the right specification.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: