XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Chen ly replied

18 Aug 2021, 21:11
Thank you so much, professor Kripfganz. I have already sent the e-mail to you, please check it. I am looking forward to hearing from you.
Leave a comment:
Sebastian Kripfganz replied

18 Aug 2021, 11:47
Chen ly
The error message about the conformability error sounds like a bug in the code of xtdpdgmm, although I cannot replicate it with other data sets. Whether you use collapse or not, you should not see this error message. May I ask if you are using the latest version of the command (2.3.8)? (Type which xtdpdgmm to see the version you are using.)
If you are using the latest version, would it be possible to share your data set with me by e-mail so that I can investigate the problem?
Leave a comment:

Chen ly replied

17 Aug 2021, 21:32

Hello Dr. Kripfganz,
I really appreciate all your work.

I am currently having trouble with my dynamic model. N = 299. T = 5.

I tried to use the postestimation command " estat hausman " to carry out the generalized Hausman test. Here are my codes.

HTML Code:

xtdpdgmm L(0/1).surplus3 L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g  invest poprate , model(fod)  ///  
gmm(surplus3, lag(1 2) ) gmm(L.debt CL.debt#CL.debt#CL.debt ,lag(1 2) collapse) ///
gmm(GSF,lag(1 2) )  gmm(BCF,lag(0 .) collapse) gmm(TOR,lag(0 .)) gmm(g,lag(1 .) collapse) ///
gmm(invest,lag(1 2) collapse)  gmm(poprate,lag(0 2) collapse)  ///
gmm(BCF TOR poprate,lag(0 0) model(md)) ///
gmm(surplus3, lag(1 1) diff model(level)) ///
gmm(L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g invest poprate, lag(0 0) diff model(level)) ///
teffects two vce(r)  nl(iid) 

estimates store iid

xtdpdgmm L(0/1).surplus3 L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g  invest poprate , model(fod)  ///  
gmm(surplus3, lag(1 2) ) gmm(L.debt CL.debt#CL.debt#CL.debt ,lag(1 2) collapse) ///
gmm(GSF,lag(1 2) )  gmm(BCF,lag(0 .) collapse) gmm(TOR,lag(0 .)) gmm(g,lag(1 .) collapse) ///
gmm(invest,lag(1 2) collapse)  gmm(poprate,lag(0 2) collapse)  ///
gmm(BCF TOR poprate,lag(0 0) model(md)) ///
gmm(surplus3, lag(1 1) diff model(level)) ///
gmm(L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g invest poprate, lag(0 0) diff model(level)) ///
teffects two vce(r)  nl(noserial) 

estat hausman iid

However, it appears following mistakes: " conformability error: A matrix, vector, or scalar has the wrong number of rows and/or columns for what is required. Adding a 2 x 3 matrix to a 1 x 4 would result in this error."
Then I modify my code as:

HTML Code:

xtdpdgmm L(0/1).surplus3 L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g  invest poprate , model(fod)  collapse ///  
gmm(surplus3, lag(1 2) ) gmm(L.debt CL.debt#CL.debt#CL.debt ,lag(1 2) ) ///
gmm(GSF,lag(1 2) )  gmm(BCF,lag(0 .) ) gmm(TOR,lag(0 .)) gmm(g,lag(1 .) ) ///
gmm(invest,lag(1 2) )  gmm(poprate,lag(0 2) )  ///
gmm(BCF TOR poprate,lag(0 0) model(md)) ///
gmm(surplus3, lag(1 1) diff model(level)) ///
gmm(L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g invest poprate, lag(0 0) diff model(level)) ///
teffects two vce(r)  nl(iid) 

estimates store iid

xtdpdgmm L(0/1).surplus3 L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g  invest poprate , model(fod)  collapse ///  
gmm(surplus3, lag(1 2) ) gmm(L.debt CL.debt#CL.debt#CL.debt ,lag(1 2) ) ///
gmm(GSF,lag(1 2) )  gmm(BCF,lag(0 .) ) gmm(TOR,lag(0 .)) gmm(g,lag(1 .) ) ///
gmm(invest,lag(1 2) )  gmm(poprate,lag(0 2) )  ///
gmm(BCF TOR poprate,lag(0 0) model(md)) ///
gmm(surplus3, lag(1 1) diff model(level)) ///
gmm(L.debt CL.debt#CL.debt#CL.debt GSF BCF TOR g invest poprate, lag(0 0) diff model(level)) ///
teffects two vce(r)  nl(noserial)

It showed that the error disappears when taking all instruments "collapse". So I have two queries.
First, why is this a problem? Is it " nl(noserial)" should cooperated with "collapse"?
Second, is there any obvious error in my code?
Again I really appreciate all your work, thank you in advance.

Leave a comment:

Sebastian Kripfganz replied

17 Aug 2021, 06:55
Importantly, gmm(ln_output, lag(1 1) model(level)) does not automatically create differences of the instruments for the level model. You also need to add the difference suboption: gmm(ln_output, lag(1 1) diff model(level))

It is surprising that your model still seems to pass all of the overidentification tests because it clearly should not. Using levels of the lagged dependent variable as instruments for the level model violates the underlying assumptions (unless there are actually no unobserved unit-specific effects present).

In principle, you could use a "level GMM" estimator that only uses first differences as instruments for the level model. But if those instruments are valid, then typically those for the first-differenced model should be valid as well and normally they would help with the identification. However, it could be that the instruments for the first-differenced model are quite weak, in which case it might indeed make sense to drop them.

If the model is underidentified, then all other statistics based on that model need to be interpreted with caution. They may not be reliable.
Leave a comment:

Tiyo Ardiyono replied

17 Aug 2021, 06:26

Dear Sebastian,

I tried to get the best specification for my model by referring to Kiviet (2019) and Kripfganz (2019) using XTDPDGMM. I started with the endogenous model below as a start, but as we can see, it does not pass the under-identification test. However, its Hansen test, difference-in-Hansen, and AR2 requirement are all satisfied.

Code:

xtdpdgmm L(0/1).ln_output $var, coll model(diff) gmm(ln_output, lag(2 .)) gmm(ln_export, lag(2 .)) gmm(lgm2int, lag(2 .)) gmm(c.ln_export#c.lgm2int, lag(2 .)) gmm(ln_v1115, lag(2 .)) teffect two overid vce(robust) small

Under-identification test:

Code:

underid, overid underid kp sw noreport

Code:

collinearity check...
collinearities detected in [Y X Z] (right to left): __alliv_18 __alliv_17 __alliv_16
collinearities detected in [X Z Y] (right to left): 2012.year 2011.year 2010bn.year
warning: collinearities detected, reparameterization may be advisable

Overidentification test: Kleibergen-Paap robust LIML-based (LM version)
  Test statistic robust to heteroskedasticity and clustering on psid
j=    3.74  Chi-sq( 10) p-value=0.9584

Underidentification test: Kleibergen-Paap robust LIML-based (LM version)
  Test statistic robust to heteroskedasticity and clustering on psid
j=    9.92  Chi-sq( 11) p-value=0.5378

2-step GMM J underidentification stats by regressor:
j=   12.40  Chi-sq( 11) p-value=0.3345 L.ln_output
j=   16.89  Chi-sq( 11) p-value=0.1112 ln_export
j=   11.07  Chi-sq( 11) p-value=0.4371 lgm2int
j=   11.38  Chi-sq( 11) p-value=0.4123 c.ln_export#c.lgm2int
j=   22.75  Chi-sq( 11) p-value=0.0191 ln_v1115
j=   24.55  Chi-sq( 11) p-value=0.0106 2010bn.year
j=   24.55  Chi-sq( 11) p-value=0.0106 2011.year
j=   24.55  Chi-sq( 11) p-value=0.0106 2012.year

Then I did some steps: (i) treating the variables as predetermined or exogenous, (ii) using differences as instruments for the level model (Blundell & Bond), and (iii) non-linear moment condition (Ahn & Schmidt). These steps did not reject the under-identification test (KP p-value are all above 0.10). The only approach that satisfies the Kiviet(2019) guidance: Hansen p-value > 0.20, difference-in-Hansen p-value > 0.20, AR-2 p-value > 0.20 (in this case > 0.10), and also pass the under-identification test is the model below.

Code:

xtdpdgmm L(0/1).ln_output $var, coll model(diff)  gmm(ln_output, lag(2 .)) gmm(ln_output, lag(1 1) model(level)) gmm(ln_v1115, lag(1 1) model(level)) teffect two overid vce(robust) small gmm(ln_export, lag(1 1) model(level)) gmm(lgm2int, lag(1 1) model(level)) gmm(c.ln_export#c.lgm2int, lag(1 1) model(level))

Code:

estat overid, difference
Sargan-Hansen (difference) test of the overidentifying restrictions
H0: (additional) overidentifying restrictions are valid

2-step weighting matrix from full model

                  | Excluding                   | Difference                  
Moment conditions |       chi2     df         p |        chi2     df         p
------------------+-----------------------------+-----------------------------
   1, model(diff) |     0.0000      0         . |      1.8547      3    0.6031
  2, model(level) |     1.5235      2    0.4669 |      0.3312      1    0.5649
  3, model(level) |     1.6436      2    0.4396 |      0.2111      1    0.6459
  4, model(level) |     1.6813      2    0.4314 |      0.1734      1    0.6771
  5, model(level) |     1.6548      2    0.4372 |      0.2000      1    0.6548
  6, model(level) |     1.7867      2    0.4093 |      0.0680      1    0.7943
  7, model(level) |     0.0000      0         . |      1.8547      3    0.6031
     model(level) |          .     -5         . |           .      .         .

Code:

. estat serial, ar(1/2)

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =         .   Prob > |z|  =         .
H0: no autocorrelation of order 2:     z =    1.4313   Prob > |z|  =    0.1523

Code:

. underid, underid kp sw noreport

collinearity check...
collinearities detected in [Y X Z] (right to left): __alliv_11 __alliv_10 __alliv_9 __alliv_4
collinearities detected in [X Z Y] (right to left): 2012.year 2011.year 2010bn.year L.ln_output
warning: collinearities detected, reparameterization may be advisable

Underidentification test: Kleibergen-Paap robust LIML-based (LM version)
  Test statistic robust to heteroskedasticity and clustering on psid
j=   95.61  Chi-sq(  4) p-value=0.0000

2-step GMM J underidentification stats by regressor:
j=  111.89  Chi-sq(  4) p-value=0.0000 L.ln_output
j=   90.59  Chi-sq(  4) p-value=0.0000 ln_export
j=   74.49  Chi-sq(  4) p-value=0.0000 lgm2int
j=   72.65  Chi-sq(  4) p-value=0.0000 c.ln_export#c.lgm2int
j=   97.95  Chi-sq(  4) p-value=0.0000 ln_v1115
j=  942.49  Chi-sq(  4) p-value=0.0000 2010bn.year
j=  942.49  Chi-sq(  4) p-value=0.0000 2011.year
j=  942.49  Chi-sq(  4) p-value=0.0000 2012.year

My questions are:
1. Is this approach valid? Kiviet(2019) and Kripfganz(2019) provide examples started from Arellano-Bond followed by adding the differences as instruments for level. In my approach, all covariates rather than lagged-dependent variables only have differences for the level model.
2. If it is valid, can I say this one as system GMM (Blundel & Bond model)?
3. The MMSC prefer the first model to the second one, but the first one does not pass the under-identification test. Does it make sense that MMSC prefer an unproperly specified model?

Thank you very much.

Cheers,
Tiyo

Last edited by Tiyo Ardiyono; 17 Aug 2021, 06:44. Reason: Typo

Leave a comment:

Sebastian Kripfganz replied

13 Aug 2021, 12:06
I am afraid there was another bug in xtdpdgmm that is now fixed with the latest update to version 2.3.8:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata) replace

This bug could lead to an unexpected error message or incorrect results from postestimation commands after estimating a model with nonlinear moment conditions.

Thanks to Tiyo Ardiyono for reporting this problem.
Leave a comment:
Sebastian Kripfganz replied

31 Jul 2021, 15:17
With the usual thanks to Kit Baum, the latest version 2.3.7 of xtdpdgmm with all the bug fixes mentioned here over the last year is now also available on SSC.

Code:

adoupdate xtdpdgmm, update
Leave a comment:
Kayode Olaide replied

28 Jul 2021, 07:27
Thank you so much for your response. I don't seem to know how to start a new post on this platform. I actually posted my query here because I could not figure that out .
Leave a comment:
Sebastian Kripfganz replied

28 Jul 2021, 07:04
Originally posted by Kayode Olaide View Post

I'm using the CCE estimation technique for a research work. My dataset consists of eight cross-sectional units (N) and 24 time (T) dimensions in each cross-sectional unit, and I'm using Stata for my estimation. The cross-sectional dependence test shows that the panels are cross-sectionally dependent. Also, the variables have mixed order of integration (stationarity), i.e., I(0) and I(1). Both the CCEMG and CCEPMG don't seem to be quite appropriate for my dataset. Please, I need help finding a suitable estimation technique, and will really appreciate suggestions. Thank you.

This does not seem to be the right topic for your query. GMM estimators for dynamic panel data models are typically designed for large-N, small-T panels. Your data does not appear to be suitable. Please start a new topic with an informative title, so that others can help as well. In any case, with such a small N, you cannot really account for common correlated effects unless you have some appropriate proxy variables for them (i.e. "global" variables that are constant across units).
Leave a comment:
Kayode Olaide replied

28 Jul 2021, 06:56
I'm using the CCE estimation technique for a research work. My dataset consists of eight cross-sectional units (N) and 24 time (T) dimensions in each cross-sectional unit, and I'm using Stata for my estimation. The cross-sectional dependence test shows that the panels are cross-sectionally dependent. Also, the variables have mixed order of integration (stationarity), i.e., I(0) and I(1). Both the CCEMG and CCEPMG don't seem to be quite appropriate for my dataset. Please, I need help finding a suitable estimation technique, and will really appreciate suggestions. Thank you.
Leave a comment:
Sebastian Kripfganz replied

27 Jul 2021, 10:02
The rejections of the AR(3) and the Hansen test indicate that the model is still potentially misspecified, e.g. further lags could be needed or some other relevant variables might be omitted. Often, the tests do not provide any specific guidance about the source of the problem. You could use difference-in-Hansen tests to see if there is a problem with any particular variable. (See for example the section on "Model Selection" in my 2019 London Stata Conference presentation.) I would still recommend not to use all available lags of yield_mtha yield_dev. Restricting the lag length might improve the reliability of the specification tests.

Regarding the magnitude of the bias for the coefficient \(\rho\) of the lagged dependent variable (when there is only one lag), it can be approximated as \(-(1+\rho) / (T-1)\). Thus, e.g. for \(\rho = 0.6\) and \(T = 28\), we would get a bias of approximately -0.06. This may still be too large to be tolerated. There is no specific rule of thumb.
Leave a comment:

Jason Xiao replied

27 Jul 2021, 09:22

Dear Sebastian,

Thank you for your response. When I add another lagged dependent variable to the model, the AR(2) test is satisfied. However, AR(3) AND both the Sargan and Hansen test are not satisfied.

Code:

 xtdpdgmm yield_mtha L.yield_mtha L2.yield_mtha rs_gdd_s2 rs_hdd_s2 rs_precip_s2, ///
> model(diff) gmm(yield_mtha yield_dev, lag(3 .)) iv(rs_gdd_s2 rs_hdd_s2 rs_precip_s2, model(mdev)) two  coll vce(r)  

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  .61599272
Step 2         f(b) =   .7763858

Group variable: code_muni                    Number of obs         =     12798
Time variable: year                          Number of groups      =       474

Moment conditions:     linear =      56      Obs per group:    min =        27
                    nonlinear =       0                        avg =        27
                        total =      56                        max =        27

                            (Std. Err. adjusted for 474 clusters in code_muni)
------------------------------------------------------------------------------
             |              WC-Robust
  yield_mtha |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  yield_mtha |
         L1. |   .3454116    .047806     7.23   0.000     .2517135    .4391097
         L2. |   .2802068   .0349527     8.02   0.000     .2117008    .3487128
             |
   rs_gdd_s2 |   .0898764   .0793811     1.13   0.258    -.0657077    .2454606
   rs_hdd_s2 |  -2.144849   1.132075    -1.89   0.058    -4.363676     .073978
rs_precip_s2 |   .1919315   .0210945     9.10   0.000     .1505871     .233276
       _cons |   .0627011    .178586     0.35   0.726    -.2873209    .4127232
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L3.yield_mtha L4.yield_mtha L5.yield_mtha L6.yield_mtha L7.yield_mtha
   L8.yield_mtha L9.yield_mtha L10.yield_mtha L11.yield_mtha L12.yield_mtha
   L13.yield_mtha L14.yield_mtha L15.yield_mtha L16.yield_mtha L17.yield_mtha
   L18.yield_mtha L19.yield_mtha L20.yield_mtha L21.yield_mtha L22.yield_mtha
   L23.yield_mtha L24.yield_mtha L25.yield_mtha L26.yield_mtha L27.yield_mtha
   L28.yield_mtha L3.yield_dev L4.yield_dev L5.yield_dev L6.yield_dev
   L7.yield_dev L8.yield_dev L9.yield_dev L10.yield_dev L11.yield_dev
   L12.yield_dev L13.yield_dev L14.yield_dev L15.yield_dev L16.yield_dev
   L17.yield_dev L18.yield_dev L19.yield_dev L20.yield_dev L21.yield_dev
   L22.yield_dev L23.yield_dev L24.yield_dev L25.yield_dev L26.yield_dev
   L27.yield_dev L28.yield_dev
 2, model(mdev):
   rs_gdd_s2 rs_hdd_s2 rs_precip_s2
 3, model(level):
   _cons

. 
. estat serial, ar(1/3) 

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =   -8.8835   Prob > |z|  =    0.0000
H0: no autocorrelation of order 2:     z =    1.4192   Prob > |z|  =    0.1559
H0: no autocorrelation of order 3:     z =   -3.3863   Prob > |z|  =    0.0007

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix       chi2(50)    =  368.0069
                                                       Prob > chi2 =    0.0000

2-step moment functions, 3-step weighting matrix       chi2(50)    =  375.7389
                                                       Prob > chi2 =    0.0000

Since you mentioned that I have relatively large T and the dynamic panel bias could be small, what is a rule-of-thumb in determining whether you have enough T to ingore such bias? Moreover, would you worry about the inconsistent estimator if we measure this dynamic panel using FE? I am attaching the FE estimates here for your reference.

Code:

. areg yield_mtha L.yield_mtha L2.yield_mtha rs_gdd_s2 rs_hdd_s2 rs_precip_s2, absorb (code_muni) vce(cluster code_muni)

Linear regression, absorbing indicators         Number of obs     =     12,798
                                                F(   5,    473)   =     353.32
                                                Prob > F          =     0.0000
                                                R-squared         =     0.4831
                                                Adj R-squared     =     0.4631
                                                Root MSE          =     0.4588

                            (Std. Err. adjusted for 474 clusters in code_muni)
------------------------------------------------------------------------------
             |               Robust
  yield_mtha |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  yield_mtha |
         L1. |   .3292273   .0200585    16.41   0.000     .2898126    .3686421
         L2. |   .2575985   .0157955    16.31   0.000     .2265606    .2886365
             |
   rs_gdd_s2 |   .0939887   .0570919     1.65   0.100    -.0181964    .2061737
   rs_hdd_s2 |  -2.253687   .7521274    -3.00   0.003    -3.731612   -.7757632
rs_precip_s2 |   .1822983   .0145616    12.52   0.000     .1536849    .2109116
       _cons |    .128067   .1301362     0.98   0.326    -.1276496    .3837836
-------------+----------------------------------------------------------------
   code_muni |   absorbed                                     (474 categories)

Thank you for suggesting the MLE and bias-corrected estimator approach. I will look into these paper as well.

Originally posted by Sebastian Kripfganz View Post

First of all, your number of instruments (2728) is way too large relative to the number of groups. While you are using the asymptotically optimal set of instruments, in finite samples we need to ensure that the number of instruments is reasonably small to avoid biases and unreliable test results. Common strategies to reduce the number of instruments include curtailing (i.e. setting a maximum lag order for the instruments) and collapsing (i.e. turning GMM-style moment conditions that are separate for every time period into standard moment conditions that are summations over all time periods). Given that you have a relatively large number of time periods, curtailing definitely makes sense because far lags are unlikely to be strong instruments. Unless you have a very large number of groups (in the thousands) or a very small number of time periods, collapsing usually doesn't do any harm either.

You implicitly assumed that all your independent variables (besides the lagged dependent variable) are strictly exogenous, i.e. uncorrelated with all future and past idiosyncratic errors. While leads are valid instruments in that case, this is hardly done in practice. You could further reduce the number of instruments by starting with lag 0 (the first argument of the lag() option). Moreover, it would be sufficient to simply instrument the strictly exogenous variables by themselves for the model with a mean-deviations transformation (the same as for the conventional fixed-effects estimator), i.e. iv(rs_gdd_s2 rs_hdd_s2 rs_precip_s2, model(mdev)).

Given your relatively large number of time periods and the strict exogeneity assumption for the independent variables, you may not even need a GMM estimator at all, as the dynamic panel data bias of the conventional fixed-effects estimator might be sufficiently small. If you still worry about the bias, a maximum likelihood estimator or a bias-corrected estimator might be more efficient alternatives to the GMM estimator (and possibly with better finite-sample properties as well). See for instance (with links to Stata packages):

Kripfganz, S. (2016). Quasi-maximum likelihood estimation of linear dynamic short-T panel-data models. Stata Journal 16 (4), 1013-1038.
Breitung, J., S. Kripfganz, and K. Hayakawa (2021). Bias-corrected method of moments estimators for dynamic panel data models. Econometrics and Statistics, forthcoming.

Regarding the Arellano-Bond test: Assuming those test results remain qualitatively the same once you have appropriately dealt with the too-many-instruments problem, this would indicate that the model is not dynamically complete. The remaining serial correlation in the error term would cause the instruments for the lagged dependent variable invalid. A remedy would be to add further lags of the dependent variable (and possibly the independent variables) as regressors to proxy for this serial correlation.

Leave a comment:

Sebastian Kripfganz replied

22 Jul 2021, 06:29
First of all, your number of instruments (2728) is way too large relative to the number of groups. While you are using the asymptotically optimal set of instruments, in finite samples we need to ensure that the number of instruments is reasonably small to avoid biases and unreliable test results. Common strategies to reduce the number of instruments include curtailing (i.e. setting a maximum lag order for the instruments) and collapsing (i.e. turning GMM-style moment conditions that are separate for every time period into standard moment conditions that are summations over all time periods). Given that you have a relatively large number of time periods, curtailing definitely makes sense because far lags are unlikely to be strong instruments. Unless you have a very large number of groups (in the thousands) or a very small number of time periods, collapsing usually doesn't do any harm either.

You implicitly assumed that all your independent variables (besides the lagged dependent variable) are strictly exogenous, i.e. uncorrelated with all future and past idiosyncratic errors. While leads are valid instruments in that case, this is hardly done in practice. You could further reduce the number of instruments by starting with lag 0 (the first argument of the lag() option). Moreover, it would be sufficient to simply instrument the strictly exogenous variables by themselves for the model with a mean-deviations transformation (the same as for the conventional fixed-effects estimator), i.e. iv(rs_gdd_s2 rs_hdd_s2 rs_precip_s2, model(mdev)).

Given your relatively large number of time periods and the strict exogeneity assumption for the independent variables, you may not even need a GMM estimator at all, as the dynamic panel data bias of the conventional fixed-effects estimator might be sufficiently small. If you still worry about the bias, a maximum likelihood estimator or a bias-corrected estimator might be more efficient alternatives to the GMM estimator (and possibly with better finite-sample properties as well). See for instance (with links to Stata packages):
Kripfganz, S. (2016). Quasi-maximum likelihood estimation of linear dynamic short-T panel-data models. Stata Journal 16 (4), 1013-1038.

Breitung, J., S. Kripfganz, and K. Hayakawa (2021). Bias-corrected method of moments estimators for dynamic panel data models. Econometrics and Statistics, forthcoming.

Regarding the Arellano-Bond test: Assuming those test results remain qualitatively the same once you have appropriately dealt with the too-many-instruments problem, this would indicate that the model is not dynamically complete. The remaining serial correlation in the error term would cause the instruments for the lagged dependent variable invalid. A remedy would be to add further lags of the dependent variable (and possibly the independent variables) as regressors to proxy for this serial correlation.
Leave a comment:

Jason Xiao replied

21 Jul 2021, 16:46

Hi, I have a question regarding failing to satisify the higher order serial correlation test after difference GMM. I am not sure if it's caused by my xtdpdgmm command or somehting else. It's my first time doing dynamic panel estimation so please let me know if there is anything that I am missing.

Code:

Code:

xtdpdgmm yield_mtha L.yield_mtha rs_gdd_s2 rs_hdd_s2 rs_precip_s2, ///
model(diff) gmm(yield_mtha, lag(2 .)) gmm(rs_gdd_s2 rs_hdd_s2 rs_precip_s2, lag(. .))  


note: standard errors may not be valid

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  3.3141562

Group variable: code_muni                    Number of obs         =     13272
Time variable: year                          Number of groups      =       474

Moment conditions:     linear =    2728      Obs per group:    min =        28
                    nonlinear =       0                        avg =        28
                        total =    2728                        max =        28

------------------------------------------------------------------------------
  yield_mtha |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  yield_mtha |
         L1. |   .4519066   .0089698    50.38   0.000     .4343261     .469487
             |
   rs_gdd_s2 |   .1722979   .0518055     3.33   0.001      .070761    .2738348
   rs_hdd_s2 |  -1.925162    .496409    -3.88   0.000    -2.898106   -.9522184
rs_precip_s2 |    .158965   .0150132    10.59   0.000     .1295397    .1883903
       _cons |   .1510277   .1209611     1.25   0.212    -.0860518    .3881071
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   1992:L2.yield_mtha 1993:L2.yield_mtha 1994:L2.yield_mtha 1995:L2.yield_mtha
   1996:L2.yield_mtha 1997:L2.yield_mtha 1998:L2.yield_mtha 1999:L2.yield_mtha
   2000:L2.yield_mtha 2001:L2.yield_mtha 2002:L2.yield_mtha 2003:L2.yield_mtha
   2004:L2.yield_mtha 2005:L2.yield_mtha 2006:L2.yield_mtha 2007:L2.yield_mtha
   2008:L2.yield_mtha 2009:L2.yield_mtha 2010:L2.yield_mtha 2011:L2.yield_mtha
.......
.......
.......

I am estimating a yield response model for a perennial crop. On the LHS yield_mtha is the yield for year t. On the RHS rs_gdd_s2 rs_hdd_s2 rs_precip_s2 areweather realizations during the period of interest for year t. Since it's a perennial crop, I am also interested in accounting for the "alternate bearing" effect, which means a big crop year is usually followed by a small crop year (In the full model I use, I also include a one-year lagged yield deviation. The purpose for this simplified model is to understand the command).

Below is the Arellano-Bond test for absence of serial correlation in the first-differenced errors. I reject all of them, which is really confusing.

Code:

Code:

estat serial, ar(1/3) 

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =  -50.3692   Prob > |z|  =    0.0000
H0: no autocorrelation of order 2:     z =   23.5852   Prob > |z|  =    0.0000
H0: no autocorrelation of order 3:     z =  -13.4781   Prob > |z|  =    0.0000

My question is (1) what does it indicate when I fail to satisfy the AB test for higher order autocorrelation? (2) What should I do when I encounter this situation?

Thank you so much!

Leave a comment:

Sebastian Kripfganz replied

07 Jul 2021, 13:12
Originally posted by Sebastian Kripfganz View Post

There is a new update to version 2.3.4 on my website:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

This version fixes a bug that produced an incorrect list of instruments in the output footer and incorrectly labelled the instruments generated by the postestimation command predict, iv. This bug only bit if a static model was estimated with GMM-type instruments. If the model included a lag of the dependent or independent variables, then the problem did not occur. This bug did not affect any of the computations. It was just a matter of displaying the correct list of instruments.

This is a bit embarrassing. It turns out that I did not entirely fix the bug with the instrument labels. In fact, for some specifications I even made it worse. A new update to version 2.3.5 is now available that hopefully this time really fixes this issue. As before, the bug only affected labels and the displaying of the instrument list. The estimation results themselves were not affected.
Leave a comment:

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: