XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#211

28 Jul 2020, 02:39

The R² for IV/2SLS/GMM regressions is of limited to no use. See for example the following Stata FAQ:
https://www.stata.com/support/faqs/s...least-squares/

the R² really has no statistical meaning in the context of 2SLS/IV

For the random-effects model, please see the Remarks and Examples section in the Stata Manual entry for xtreg.

https://www.kripfganz.de/stata/
1 like
Comment

Edgar Kausel

Join Date: Jul 2015
Posts: 13

#212

31 Jul 2020, 01:04

Hi Sebastian,
I'm having problems when conducting the Arellano-Bond test for autocorrelation.

First, I go with:

Code:

xtdpdgmm lead_zjsat_6items  L.lead_jsat_6items i.lead_vol1##c.log_leaving i.wave , gmmiv(L.lead_jsat_6items , collapse) iv(i.wave) vce(robust) overid

Group variable: id                           Number of obs         =     91850
Time variable: wave                          Number of groups      =     15287

Moment conditions:     linear =      28      Obs per group:    min =         1
                    nonlinear =       0                        avg =  6.008373
                        total =      28                        max =        14

                                    (Std. Err. adjusted for 15,287 clusters in id)
----------------------------------------------------------------------------------
                 |               Robust
lead_zjsat_6it~s |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
lead_jsat_6items |
             L1. |   .3542155   .1051563     3.37   0.001     .1481129    .5603181
                 |
     1.lead_vol1 |  -7.993102   4.104824    -1.95   0.052    -16.03841    .0522057
     log_leaving |  -.7183377    .343703    -2.09   0.037    -1.391983   -.0446922
                 |
       lead_vol1#|
   c.log_leaving |
              1  |   3.885735   .7894465     4.92   0.000     2.338448    5.433021
                 |
            wave |
              1  |          0  (empty)
              2  |   .0678619    .049292     1.38   0.169    -.0287486    .1644724
              3  |   .0074792   .0487522     0.15   0.878    -.0880733    .1030317
              4  |          0  (omitted)
              5  |   .0847805   .0469901     1.80   0.071    -.0073183    .1768793
              6  |   .0679861   .0464155     1.46   0.143    -.0229866    .1589588
              7  |   .0552715   .0479401     1.15   0.249    -.0386893    .1492324
              8  |   .1307863   .1192254     1.10   0.273    -.1028912    .3644638
              9  |   .0752563    .076418     0.98   0.325    -.0745202    .2250328
             10  |   .0612896   .0672975     0.91   0.362    -.0706111    .1931904
             11  |   .0365584   .0620174     0.59   0.556    -.0849935    .1581103
             12  |    .102438   .0816722     1.25   0.210    -.0576366    .2625126
             13  |   .1125361   .0718052     1.57   0.117    -.0281995    .2532717
             14  |   .0727399   .0759155     0.96   0.338    -.0760518    .2215315
             15  |   .1186052   .0606086     1.96   0.050    -.0001854    .2373958
             16  |          0  (empty)
                 |
           _cons |   -1.87803   1.056312    -1.78   0.075    -3.948363    .1923031
----------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(level):
   L.lead_jsat_6items L1.L.lead_jsat_6items L2.L.lead_jsat_6items
   L3.L.lead_jsat_6items L4.L.lead_jsat_6items L5.L.lead_jsat_6items
   L6.L.lead_jsat_6items L7.L.lead_jsat_6items L8.L.lead_jsat_6items
   L9.L.lead_jsat_6items L10.L.lead_jsat_6items L11.L.lead_jsat_6items
   L12.L.lead_jsat_6items L13.L.lead_jsat_6items
 2, model(level):
   3bn.wave 4.wave 5.wave 6.wave 7.wave 8.wave 9.wave 10.wave 11.wave 12.wave
   13.wave 14.wave 15.wave
 3, model(level):
   _cons

But when I try the test I get:

Code:

estat serial, ar(1/3)

Arellano-Bond test for autocorrelation of the first-differenced residuals
D.0b:  operator invalid
r(198);

Am I doing something wrong?

Thanks.
Ed

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#213

31 Jul 2020, 03:23

Edgar Kausel
There was a bug in estat serial that (I thought) I fixed with the latest update. Could you please tell me which version of xtdpdgmm you are using? You can find your version by typing the following in Stata's command window:

Code:

which xtdpdgmm

If you do not have version 2.2.7, please update to the latest version which should hopefully solve your problem:

Code:

adoupdate xtdpdgmm, update

https://www.kripfganz.de/stata/
1 like
Comment
Prashant Gupta

Join Date: Jul 2020

Posts: 3
#214

31 Jul 2020, 12:00

Hi Sebastian and Statalisters,

I am using xtdpdgmm command to run system gmm but I am getting this error r(2000). It says that "You have requested some statistical calculation and there are no observations on which to perform it. Perhaps you specified if or in and inadvertently filtered all the data."

N is 45 and T is 10.

The command is xtdpdgmm dv l.dv iv1 iv2 iv3 iv4 , twostep vce(cluster id) teffects gmmiv(l.dv, lag(1 2) collapse model(fodev)) gmmiv(iv1 , lag(1 2) collapse model(fodev)) gmmiv(iv2 , lag(1 2) collapse model(fodev)) gmmiv(iv3 , lag(1 2) collapse model(fodev)) gmmiv(iv4 , lag(0 0) collapse model(level)) nofootnote

Please help.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#215

31 Jul 2020, 12:40

r(2000) is a "no observations" error. One reason might be that you did not properly xtset your data. For example, if your time periods are more than 1 time unit (e.g. year) apart, then you need to specify this with the delta() option of xtset. Another reason might be that you have many gaps (missing values) in your data set such that you do not have 3 consecutive time periods. Can you share with us the output from the following command?

Code:

xtdescribe

https://www.kripfganz.de/stata/
Comment
Edgar Kausel

Join Date: Jul 2015

Posts: 13
#216

31 Jul 2020, 15:18

Sebastian Kripfganz

That was it. I was using version 2.2.0. Thanks!

Ed
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#217

03 Aug 2020, 05:13

Hi,

In order to estimate dynamic panels accurately, I read the paper titled "Microeconometric dynamic panel data methods: Model specification and selection issues" by Jan F. Kiviet. Concerning this paper, I have the following doubts:

1. The author repeatedly reiterates in his paper that as long as Arellano–Bond results are unsatisfactory, applying Blundell–Bond does not make sense. So how does one make a choice between Arellano-Bond's difference GMM and Blundell-Bond system GMM? Is there a criteria for the same? In this regard, the author also talks about the concept of effect-stationarity and effect non-stationarity. What do these concepts imply?

2. The author states: "When the errors of the level equation are serially uncorrelated indeed, those of the first-differenced equation have negative first-order serial correlation of moving average form, with a first-order serial correlation coefficient −0.5 and zero second and higher-order serial correlation coefficients". How is this exact figure of -0.5 derived? Also, how is the author so sure about zero second and higher-order serial correlation coefficients? Is there a mathematical proof for the same?

3. The author states: "lags of exogenous regressors will establish strong and valid instruments for any non-exogenous regressors, especially for regressors affected by immediate or lagged feedbacks from the dependent variable, in particular the lagged dependent regressor variables themselves." However, I thought a particular variable's lags/lead can serve as instruments for the same variable only. How come lags of exogenous variables serve as valid instruments of non-exogenous regressors?

4. The author states: "Anyhow, if at least twice lagged regressors turn out to be invalid instruments this implies that the regression equation has not yet been specified adequately and requires additional explanatories". I could not understand author's point here. Is he saying that if lag(1 2) turn out to be invalid instrument (as indicated by the difference-in-Hansen test), we should include more lags of the variable as regressors in the model?

5. The author states: "an exogenous regressor is predetermined, but a predetermined regressors is usually not exogenous". I could not understand how an exogenous regressor is predetermined?

6. The author writes: "This finding instigates to start our model specification search by including at least one lag of all regressors, because validity of internal instruments constructed from lagged not strictly exogenous regressors requires white-noise disturbances, and obtaining white noise disturbances is promoted by using sufficiently large orders of all lag polynomials." So should we include at least one lag of all independent variables as regressor?

7. The author states: "one could move on to stage 4, or first verify whether any of the coefficients for the longest lag of a variable x^(m) or of y_i,t has a t-value below 0.5, say, or a p-value above 0.6 or 0.7, say. If so, impose the least significant one of them to be zero, re-estimate the model, and repeat the same procedure until the coefficients of all longest lags have absolute t -values (well) above 0.5, and the m₁, m₂, J and incJ tests still produce satisfactory results." I could not understand what the author is trying to convey here.

8. The author states: "Useful additional evidence can be produced by also testing the joint significance of groups of single coefficient restrictions already imposed on the MSM and verifying whether the p -value is high indeed. Such joint significance tests can also be obtained by using the “test” option." Again, I could not understand author's viewpoint here.

9. I suppose we should always use the two-step estimator. Is this correct?

Thanks and Regards
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#218

03 Aug 2020, 06:36

The Blundell/Bond system GMM estimator extends the Arellano/Bond difference GMM estimator by adding further moment conditions (i.e. instruments). If some of the instruments for the difference GMM estimator are invalid, they will still be invalid if you add further instruments. With xtdpdgmm you could use the overid option and then the estat overid, difference postestimation command after the system GMM estimation. The last line in the test output that starts with model(level) can be used to make the desired assessment. If the test in the column headed "Excluded" does not reject the null hypothesis, then the difference GMM estimator is fine and you can use the column headed "Difference" to test the additional instruments used for the system GMM estimator. If the test in column headed "Excluded" rejects the null hypothesis, then the difference GMM estimator is misspecified and the corresponding "Difference" test becomes useless.

Given homoskedasticity and no serial correlation of the idiosyncratic error term \(e_{it}\), this is a simple algebraic relationship: \(Corr(\Delta e_{it}, \Delta e_{i,t-1}) = Corr(e_{it}-e_{i,t-1}, e_{i,t-1}-e_{i,t-2}) = -Var(e_{it}) / Var(\Delta e_{it}) = -Var(e_{it}) / (2 Var(e_{it})) = -1/2\). Similarly, all higher-order correlations are zero because of the non-overlapping time periods in the numerator.

There is no mapping of specific instruments to specific regressors. All instruments instrument all regressors. It is reasonable to believe that lags of a specific regressor have particularly strong predictive power for that specific regressor but that does not exclude the possibility that they may also have predictive power for other regressors. In fact, if a regressor is a predictor of the dependent variable, then it is reasonable to believe that the lags of such a regressor are also good predictors for the lagged dependent variable.

If you assume that a variable is endogenous, you could use lags(2 .) as instruments if the model is correctly specified. If the difference-in-Hansen test rejects those instruments, then this is evidence that there is still some misspecification present. This could be omitted variables such as omitted dynamics in the form of lags of the regressors, or omitted interaction terms.

In the terminology of (strictly) exogenous, predetermined, and endogenous regressors, all instruments (lags) that are valid for a predetermined variable are also valid for a strictly exogenous variable, but not the other way round.

You want to start your specification search with a model that is correctly specified such that the estimation is consistent (although possibly inefficient). Otherwise, your difference-in-Hansen test might compare two misspecified models with each other which would not be a meaningful comparison; see point 1 above. The more lags of the regressors you include in the regressor list, the less likely it is that there will still be serial correlation in the error term which might invalidate some of the instruments.

This is a suggestion for a model specification algorithm. Essentially the idea is to start with a possibly overspecified model (that yields consistent estimation) and then to remove some of the lagged regressors if their coefficients are statistically insignificant and the model specification tests still not reject the model after you removed those regressors. Jan Kiviet promotes a conservative view on the use of p-values, i.e. to use p-values as threshold that are much higher than 0.05 to make sure that you are on the safe side.

Instead of just testing for the significance of a single coefficient, you could also use joint significance tests for multiple coefficients in your specification search.

I would say that there are at least 2 situations where a one-step estimator is justified: (i) if you are using the difference GMM estimator with the added homoskedasticity assumption such that the one-step weighting matrix is already optimal (which is strong assumption and instead of imposing it you might just run the two-step estimator to let the data speak for itself); (ii) if your estimation sample is relatively small because the efficient estimation of the optimal weighting matrix requires a large number of groups. Both the one-step and the two-step estimator are consistent estimators but in general the two-step estimator is efficient while the one-step estimator may not be efficient. However, keep in mind that efficiency is an asymptotic concent. When your sample is very small, the finite-sample properties might be very different and the estimation of the optimal weighting matrix might lack robustness.

https://www.kripfganz.de/stata/
2 likes
Comment

Prateek Bedi

Join Date: Sep 2018
Posts: 199

#219

05 Aug 2020, 06:21

Dear Prof. Kripfganz,

Your responses are enlightening as always! I got to know some completely new things which I never thought of. Thank you so very much! I have some follow-up queries:

1.I have the following output for the difference-in-Hansen test for my model. Do you think I should stick to system-GMM or switch to difference-GMM?

Code:

2-step weighting matrix from full model

                  | Excluding                   | Difference                  
Moment conditions |       chi2     df         p |        chi2     df         p
------------------+-----------------------------+-----------------------------
  1, model(fodev) |    94.4909    106    0.7808 |      0.0107      1    0.9175
  2, model(fodev) |    94.2920    106    0.7851 |      0.2096      1    0.6471
  3, model(fodev) |    94.4466    106    0.7817 |      0.0550      1    0.8146
  4, model(fodev) |    93.8516    104    0.7522 |      0.6500      3    0.8849
  5, model(fodev) |     0.5946      2    0.7428 |     93.9070    105    0.7727
  6, model(fodev) |    94.2131    105    0.7658 |      0.2885      2    0.8657
  7, model(level) |    92.4271    106    0.8235 |      2.0745      1    0.1498
  8, model(fodev) |    94.1725    106    0.7877 |      0.3291      1    0.5662
  9, model(fodev) |    94.4090    106    0.7826 |      0.0926      1    0.7610
 10, model(level) |    83.8859     93    0.7396 |     10.6156     14    0.7159

Moreover, are there any issues if we apply system-GMM estimator when difference-GMM estimator is sufficient for a model?

2. How should we check for heteroscedasticity and serial correlation for our model?

3. How do we check for joint significance tests for multiple coefficients in our model?

Thank you!

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#220

05 Aug 2020, 06:52

If anything, then only the moment conditions number 7 might be slightly worrying. All the other p-values are definitely fine. A system GMM estimator would produce more efficient / more precise estimates than a difference GMM estimator, at the added risk that it might be stronger biased if the extra instruments are weak or invalid.

To check for serial correlation, use the estat serial postestimation command. Out of the top of my head, I am not aware of an easily applicable command for heteroskedasticity testing of the residuals. The only feasible option that comes to my mind is utilizing the nonlinear moment conditions nl(noserial) and nl(iid), where the latter make the additional homoskedasticity assumption, and then to use a generalized Hausman test with estat hausman to test the additional moment restrictions imposed by nl(iid) compared to nl(noserial). See slides 63 to 65 of my 2019 London Stata Conference presentation.

You can simply use the test command as you would do after any other estimation command.

https://www.kripfganz.de/stata/
1 like
Comment
Kristian Szakali

Join Date: Aug 2020

Posts: 3
#221

12 Aug 2020, 05:19

As enquired here: https://www.statalist.org/forums/for...2-and-xtdpdgmm, I wonder why I cannot use the estat serial command after xtdpdgmm. I get an error: (r5), not sorted. Would really appreciate if someone could guide me on this and the other question in that post. Thank you.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#222

12 Aug 2020, 09:25

Kristian Szakali
Could you please check whether you have the latest version of xtdpdgmm, which should be 2.2.7. If you do not have the latest version, please update it and try your code again:

Code:

adoupdate xtdpdgmm, update

If you still get the same error message with the latest version, would it be possible for you to send me your data set per e-mail? Otherwise it is difficult to replicate the issue.

https://www.kripfganz.de/stata/
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#223

13 Aug 2020, 09:02

Originally posted by Kristian Szakali View Post

As enquired here: https://www.statalist.org/forums/for...2-and-xtdpdgmm, I wonder why I cannot use the estat serial command after xtdpdgmm. I get an error: (r5), not sorted. Would really appreciate if someone could guide me on this and the other question in that post. Thank you.

Please see my response #2 in the topic you have linked.

https://www.kripfganz.de/stata/
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2611
#224

26 Aug 2020, 06:53

There is another update of the xtdpdgmm package to version 2.3.0 available on my personal website.

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

This version fixes the problem reported in #221 above, and it adds a new feature for the estimation with nonlinear moment conditions:

Under the assumption of a serially uncorrelated idiosyncratic error term \(u_{it}\), the option nl(noserial) incorporates the following nonlinear moment conditions:
\[E[(\alpha_i+u_{iT}) \Delta u_{it}] = 0\]
for t=1,2,...,T-1.

So far, that is nothing new (see slide 58 of my 2019 London Stata Conference presentation). If we suspect first-order serial correlation of \(u_{it}\), we could still obtain valid nonlinear moment conditions by restricting them to the observations t=1,2,...,T-2. If there is second-order serial correlation, change the upper limit to T-3. This can be achieved with a new lag() suboption, e.g. when we suspect first-order serial correlation we could specify

Code:

nl(noserial, lag(2))

When you just specify nl(noserial) without the suboption, the default is lag(1), i.e. no serial correlation. I am grateful to Professor Seung Ahn for proposing this additional feature.

Last edited by Sebastian Kripfganz; 26 Aug 2020, 06:55.

https://www.kripfganz.de/stata/
1 like
Comment
Tugrul Cinar

Join Date: Sep 2020

Posts: 5
#225

11 Sep 2020, 06:53

Dear Sebastian,

I am using xtdpdgmm for my research. As far as i know xtdpdgmm (and gmm estimation in general) does not account for cross sectional dependency.

My data (maybe most of the panel data) suffers from cross sectional dependency and i tried to use extra variables to capture time varying common factors across cross sections to eliminate this problem. But later i realised that we are already using time dummies as regressors with the teffects option (or manually). Since (strong) cross-sectional dependence arise from time varying common shocks, aren't we eliminating it by adding year dummies as regressors? Will any other extra variables to capture time varying common shocks other than time dummies be redundant in this occasion?

Thanks in advance.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment