XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Akbar Isayev

Join Date: Jun 2021

Posts: 10
#346

30 Nov 2021, 07:33

It worked. Thank you very much. I was making a technical mistake. It is fine now.
Comment
Akbar Isayev

Join Date: Jun 2021

Posts: 10
#347

10 Dec 2021, 02:45

Dear Sebastian,

I have a question to you related with "Sargan-Hansen test of the overidentifying restrictions" test. In case we get different results for 2-step weighting matrix and 3-step weighting matrix, can we rely on one of them? Particularly in my case Sargan-Hansen results for 2-step weighting matrix is 15.97 (p=0.314) whilst for 3-step weighting matrix it is 26.95 (p=0.019). Is it acceptable?

Thank you in advance.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#348

10 Dec 2021, 05:54

This discrepancy could indicate that the weighting matrix is not precisely estimated. This can happen if you have many instruments relative to the number of groups and/or if you have weak instruments. If you have not done this yet, try to reduce the number of instruments by curtailing the lags used as instruments and/or by collapsing the instruments.

Alternatively, you could also try the iterated GMM estimator, option igmm instead of twostep, but it may not always converge in a reasonable number of steps if there are some underlying problems with the weighting matrix.

If you only have a small number of groups, then there is sometimes not much that can be done. Estimating the weighting matrix precisely is then hardly possible and overidentification tests have only limited reliability in such a case.

https://www.kripfganz.de/stata/
1 like
Comment
Akbar Isayev

Join Date: Jun 2021

Posts: 10
#349

10 Dec 2021, 05:55

Thank you very much for clarification.
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#350

25 Dec 2021, 16:33

Dear Prof. Sebastian,

I have the following queries regarding xtdpdgmm.

1. The post-estimation command estat overid, diff is returning the following error when I run it after the main command i.e. xtdpdgmm. Although, I have used this post-estimation command many a times before, I do not understand the reason for this error.

Code:

. estat overid, diff requested action not valid after most recent estimation command r(321);

Code:

2. How do we interpret the following numbers we get at the start of the output?

Code:

Fitting full model: Step 1 f(b) = .00305113 Step 2 f(b) = .9227573

Code:

3. If I have 3 explanatory variables in my model, say X1, X2 and X3 and I believe that X1 is predetermined, X2 is endogenous and X3 is exogenous, do I need to specify instruments for X3 in my command by opening the gmmiv brackets and specifying certain starting and ending lag lengths? If yes, what should these lag lengths be?

4. For the predetermined and endogenous variables, X1 and X2, do I need to open two gmmiv brackets, one each with model(level) and model(diff) in my command?

Thanks!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#351

27 Dec 2021, 03:27

Did you specify the overid option in the xtdpdgmm command line? This is required for running the incremental overidentification tests.

These are the values of the quadratic GMM objective function. In a just-identified model, these values would be zero. In an overidentified model, we cannot satisfy the empirical moment conditions exactly but we minimize their weighted squared deviations. The values differ between step 1 and 2 because of the different weighting matrices. The numbers themselves are not informative.

You can specify X3 either in an iv() or a gmm() option. The former is just a collapsed version of the latter. For strictly exogenous variables, all lags and leads are valid instruments. Thus, in principle, you could specify lag(. .). It is however common practice not to use leads, i.e. lag(0 .). To avoid a too-many-instruments problem, especially when the time dimension is not very small, you can further restrict the maximum lag length, e.g. lag(0 4). This guidance applies to the model(diff) instruments. For model(level), you would typically just specify lag(0 0) for exogenous variables.

This depends on what you want to achieve. If you want to implement a system GMM estimator, you need to specify separate gmm() options for model(diff) and model(level). Given that you would start with different lags for predetermined and endogenous variables, you would typically also specify separate options for the two variables. For example:

Code:

gmm(X1, lag(1 .) model(diff)) gmm(X2, lag(2 .) model(diff)) gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))

https://www.kripfganz.de/stata/
1 like
Comment

Jains Chacko

Join Date: Jan 2022
Posts: 2

#352

21 Jan 2022, 22:06

Hi Sir,

I am working on predicting the firm-level optimum level of investment. In the estimation, I have the model developed by Richardson (2006) which is expressed as an investment as a function of one year lagged values of investment, explanatory variables and control variables. I tried using the following command in Stata.

Code:

xtdpdgmm L(0/1).loginvest l1.tobinsq l1.streturn l1.cash l1.logta l1.age l1.lev, noserial gmm(L1.loginvest, collapse model(difference)) iv(l1.tobinsq l1.streturn l1.cash l1.logta l1.age, difference model(difference)) twostep vce(cluster co_id)

Postestimaation

Code:

estat serial
estat overid

Output

Code:

Generalized method of moments estimation

Fitting full model:

Step 1:
initial:       f(b) =   60.63222
alternative:   f(b) =  50.099668
rescale:       f(b) =  4.5306529
Iteration 0:   f(b) =  4.5306529  
Iteration 1:   f(b) =  .88650032  
Iteration 2:   f(b) =  .00834241  
Iteration 3:   f(b) =  .00348571  
Iteration 4:   f(b) =  .00347453  
Iteration 5:   f(b) =  .00347453  

Step 2:
Iteration 0:   f(b) =  .00273063  
Iteration 1:   f(b) =    .002638  
Iteration 2:   f(b) =    .002638  

Group variable: co_id                        Number of obs         =      1717
Time variable: year                          Number of groups      =       855

Moment conditions:     linear =       9      Obs per group:    min =         1
                    nonlinear =       1                        avg =  2.008187
                        total =      10                        max =         3

                                (Std. Err. adjusted for 855 clusters in co_id)
------------------------------------------------------------------------------
             |              WC-Robust
   loginvest |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   loginvest |
         L1. |   .3550268   .1388608     2.56   0.011     .0828647     .627189
             |
     tobinsq |
         L1. |  -.0089324   .0571256    -0.16   0.876    -.1208965    .1030317
             |
    streturn |
         L1. |   .0912456   .1086676     0.84   0.401    -.1217391    .3042302
             |
        cash |
         L1. |  -5.99e-07   9.15e-06    -0.07   0.948    -.0000185    .0000173
             |
       logta |
         L1. |   .4958169   .6708675     0.74   0.460    -.8190593    1.810693
             |
         age |
         L1. |  -.2694499   .0745122    -3.62   0.000    -.4154912   -.1234086
             |
         lev |
         L1. |  -20.92443   9.509304    -2.20   0.028    -39.56233   -2.286541
             |
       _cons |   11.72433   4.518156     2.59   0.009      2.86891    20.57976
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L1.L.loginvest L2.L.loginvest L3.L.loginvest
 2, model(diff):
   D.L.tobinsq D.L.streturn D.L.cash D.L.logta D.L.age
 3, model(level):
   _cons

.
. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =   -3.2873   Prob > |z|  =    0.0010
H0: no autocorrelation of order 2:     z =         .   Prob > |z|  =         .

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix       chi2(2)     =    2.2555
                                                       Prob > chi2 =    0.3238

2-step moment functions, 3-step weighting matrix       chi2(2)     =    2.2374
                                                       Prob > chi2 =    0.3267

I ran this code on a panel with five years of data (strongly balanced). The post-estimation Arellano-Bond test of order 2 is not available. I request you to confirm the correctness of the code used.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#353

24 Jan 2022, 03:11

I am afraid your panel is not balanced. The regression output states that you have between 1 and 3 observations per company - on average about 2. With a maximum of 3 observations per group it is not possible to calculate the AR(2) test statistic. You need a minimum of 4 time periods.

Note that the xtset command might tell you that your panel is "strongly balanced". However, this only means that you have for each company and year a respective row in your data set. It does not check whether there are any missing values. Those missing values in some of your variables turn the estimation sample unbalanced.

https://www.kripfganz.de/stata/
Comment
Jains Chacko

Join Date: Jan 2022

Posts: 2
#354

24 Jan 2022, 21:27

Thank you for your valuable comments, Sir.
Comment

Nursena Sagir

Join Date: Jan 2022
Posts: 27

#355

25 Jan 2022, 15:25

Hi Sebastian,

I have a questions related to your note below:

Originally posted by Sebastian Kripfganz View Post

With xtdpdgmm you could use the overid option and then the estat overid, difference postestimation command after the system GMM estimation. The last line in the test output that starts with model(level) can be used to make the desired assessment. If the test in the column headed "Excluded" does not reject the null hypothesis, then the difference GMM estimator is fine and you can use the column headed "Difference" to test the additional instruments used for the system GMM estimator. If the test in column headed "Excluded" rejects the null hypothesis, then the difference GMM estimator is misspecified and the corresponding "Difference" test becomes useless.

I add additional level instruments for income (following your advice on p.117 in your London Stata Conference presentation). I use following command:

Code:

xtdpdgmm L(0/1).(depression_score) income income_lag self_efficacy, model(fodev) collapse gmm(depression_score, l(1 3)) gmm(income, l(0 2)) gmm(income_lag, l(0 2)) gmm(self_efficacy, l(0 2) m(mdev)) gmm(income income_lag, lag(0 0) diff model(level)) teffects two vce(r) overid nocons

Then I look at the post estimation statistics.

Code:

 estat overid, diff

Sargan-Hansen (difference) test of the overidentifying restrictions
H0: (additional) overidentifying restrictions are valid

2-step weighting matrix from full model

                  | Excluding                   | Difference                  
Moment conditions |       chi2     df         p |        chi2     df         p
------------------+-----------------------------+-----------------------------
  1, model(fodev) |     8.3313      7    0.3043 |      1.8420      3    0.6058
  2, model(fodev) |     7.9503      7    0.3370 |      2.2230      3    0.5274
  3, model(fodev) |     8.2779      7    0.3087 |      1.8954      3    0.5944
   4, model(mdev) |     4.4378      7    0.7282 |      5.7355      3    0.1252
  5, model(level) |     8.1528      8    0.4187 |      2.0205      2    0.3641
  6, model(level) |          .     -6         . |           .      .         .
     model(fodev) |     0.6462      1    0.4215 |      9.5270      9    0.3901
     model(level) |          .     -8         . |           .      .         .

The last line in the test output that starts with model(level) is missing. How should I interpret this?

In addition to that, it is not very clear to me when we should consider to add non-linear moment conditions. Should we use Hausman test to decide?

Best regards,
Nursena

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#356

26 Jan 2022, 03:41

Row 5 in the output table provides the test results for the instruments gmm(income income_lag, lag(0 0) diff model(level)). Row 6 provides the results for the time dummy instruments, generated by the teffects option. The last row in the output table provides results for jointly testing the instruments from row 5 and 6. The missing test results (dots) tell us that there are insufficient degrees of freedom available to carry out the respective test. Removing all the instruments for the time dummys in your case means that the number of instruments would be smaller than the number of regressors, and therefore the coefficients would no longer be identified. Normally, we are primarily interested in the results from row 5.

Nonlinear moment conditions can be very useful to circumvent identification problems and to obtain more efficient estimates. However, when adding Blundell-Bond type instruments for the level model, those nonlinear moment conditions might become redundant. Technically, this redundancy occurs when we do not curtail and/or collapse the instruments. Thus, the nonlinear moment conditions may retain some relevance under such instrument reduction strategies. In practice, it is not clear whether it is beneficial to include nonlinear moment conditions jointly with collapsed Blundell-Bond instruments.

The Hausman test could be of help to decide between nonlinear moment conditions assuming absence of serial correlation and those that additionally assume homoskedasticity. It is not very helpful to decide whether or not to include any nonlinear moment conditions at all. If there is no evidence of serial correlation, it generally does not harm to include the nl(noserial) option (aside from the potential redundancy mentioned above).

https://www.kripfganz.de/stata/
Comment
Nursena Sagir

Join Date: Jan 2022

Posts: 27
#357

26 Jan 2022, 03:53

Thanks for the reply.

Originally posted by Sebastian Kripfganz View Post

The missing test results (dots) tell us that there are insufficient degrees of freedom available to carry out the respective test. Removing all the instruments for the time dummys in your case means that the number of instruments would be smaller than the number of regressors, and therefore the coefficients would no longer be identified. Normally, we are primarily interested in the results from row 5.

1. How can I solve this missing test results problem?

2. In row 5, not rejecting the additional instruments used for the system GMM estimator means that I should use system GMM estimator rather than model(fodev) specification, right? Or is it more like adding system instruments to existing FOD model?

3. My last questions is from theoretical point I believe I should define self_efficacy as endogenous variable. However, by looking all m1,m2,Hansen and underidentificatiin tests model improves when it is defined as exogenous variable. How should I decide on that?

Best regards,
Nursena
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2594
#358

26 Jan 2022, 04:14

1. You do not need to solve it. Just ignore row 6 and the last row. The important row is row 5.

2. You are adding the model(level) instruments to the model(fodev) instruments. You are not replacing them.

3. My personal view is that the specification tests can aid your specification search, especially when you are unsure about the classification of variables. If you have strong theoretical reasons to assume that your variable is endogenous, I would stick to that. If you are willing to revise your prior assumption based on the specification tests, the estimates generally become more efficient when you assume that the variable is exogenous, as you can use more and stronger instruments in the latter case.

https://www.kripfganz.de/stata/
Comment
Nursena Sagir

Join Date: Jan 2022

Posts: 27
#359

26 Jan 2022, 04:47

Thank you for your detailed and quick reply.
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#360

29 Jan 2022, 14:42

Originally posted by Sebastian Kripfganz View Post

Did you specify the overid option in the xtdpdgmm command line? This is required for running the incremental overidentification tests.

These are the values of the quadratic GMM objective function. In a just-identified model, these values would be zero. In an overidentified model, we cannot satisfy the empirical moment conditions exactly but we minimize their weighted squared deviations. The values differ between step 1 and 2 because of the different weighting matrices. The numbers themselves are not informative.

You can specify X3 either in an iv() or a gmm() option. The former is just a collapsed version of the latter. For strictly exogenous variables, all lags and leads are valid instruments. Thus, in principle, you could specify lag(. .). It is however common practice not to use leads, i.e. lag(0 .). To avoid a too-many-instruments problem, especially when the time dimension is not very small, you can further restrict the maximum lag length, e.g. lag(0 4). This guidance applies to the model(diff) instruments. For model(level), you would typically just specify lag(0 0) for exogenous variables.

This depends on what you want to achieve. If you want to implement a system GMM estimator, you need to specify separate gmm() options for model(diff) and model(level). Given that you would start with different lags for predetermined and endogenous variables, you would typically also specify separate options for the two variables. For example:

Code:

gmm(X1, lag(1 .) model(diff)) gmm(X2, lag(2 .) model(diff)) gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))

Dear Prof. Sebastian,

Thanks a lot once again for your crystal clear answers. I have the following query regarding your response in Point #4.

In the command mentioned by you (reproduced below), what is the significance of writing 'diff'? What would be the implication if we do not write it?
gmm(X1, diff lag(0 0) model(level)) gmm(X2, diff lag(1 1) model(level))
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment