XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#271

09 Apr 2021, 11:25

Originally posted by Chhavi Jatana View Post

I have five years study period, so I have formed only four dummy variables (first year is taken as a base year) based on n-1 formula. Do I still need to drop one of the year dummies?

Yes, because the lagged dependent variable is effectively reducing your estimation sample by one additional study period.

https://www.kripfganz.de/stata/
1 like
Comment
Chhavi Jatana

Join Date: Apr 2021

Posts: 4
#272

09 Apr 2021, 14:48

Alright, thank you so much for the clarification.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#273

12 May 2021, 04:37

There is a new update to version 2.3.4 on my website:

Code:

net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

This version fixes a bug that produced an incorrect list of instruments in the output footer and incorrectly labelled the instruments generated by the postestimation command predict, iv. This bug only bit if a static model was estimated with GMM-type instruments. If the model included a lag of the dependent or independent variables, then the problem did not occur. This bug did not affect any of the computations. It was just a matter of displaying the correct list of instruments.

https://www.kripfganz.de/stata/
Comment
Eliana Melo

Join Date: Jan 2021

Posts: 9
#274

12 May 2021, 08:39

Hi, I want to estimate a system gmm with xtdpdgmm in stata 14.2. My dependent variable is the percentage of Non-Technical Losses in distribution of electricity (pntbt) for 33 utilities and the period is 2003-2016. Is an unbalanced panel. I have suspect of endogeneity of an explanatory variable: duration of electric distribution outages (dec_apurado). The another variables are: hmcvi (rate of homicides), subnormal (proportion of the urban population living in slums) and inadpf (rate of people credit default).

I have the following routine with xtabond2:

Code:

xtabond2 pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmm (L.pntbt, lag(2 2) eq(d) collapse) gmm (L.pntbt, lag(2 2) eq(l) collapse) gmm (dec_apu, lag(2 2) eq(d) collapse) gmm (dec_apu, lag(1 1) eq(l) collapse) iv(hmcvi subnormal inadpf, eq(d)) iv(hmcvi subnormal inadpf, eq(l)) twostep robust

Then, I am trying to replicate the results with xtdpdgmm command:

Code:

xtdpdgmm pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmmiv(L.pntbt, lag(1 1) m(d) collapse) gmmiv(L.pntbt, lag(1 1) m(l) collapse) gmmiv(dec_apu, lag(1 1) m(d) collapse) gmmiv(dec_apu, lag(0 0) m(l) collapse) iv(hmcvi subnormal inadpf, m(d)) iv(hmcvi subnormal inadpf, m(l) diff) twostep vce(r)

The results are different with each command. I read the documentation of xtpdgmm, but I think I am missing something. Any comment would be valuable.

Last edited by Eliana Melo; 12 May 2021, 09:14.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#275

12 May 2021, 09:24

Unless you are using xtabond2 with the orthogonal option, which does not seem to be the case here, you need to specify the exact same lag orders for the instruments in the two command lines. Moreover, xtabond2 by default applies a first-difference transformation to the instruments in iv(hmcvi subnormal inadpf, eq(d)), while xtdpdgmm does not. You either need to add the passthru suboption in the xtabond2 command line, or the diff option with xtdpdgmm, i.e.

Code:

xtabond2 ..., ... iv(hmcvi subnormal inadpf, eq(d) passthru) xtdpdgmm ..., ... iv(hmcvi subnormal inadpf, m(d))

or

Code:

xtabond2 ..., ... iv(hmcvi subnormal inadpf, eq(d)) xtdpdgmm ..., ... iv(hmcvi subnormal inadpf, m(d) diff)

For the level model, no transformation is applied by default with either command. If you want first-differenced instruments for the level model, you need to modify the xtabond2 command as follows:

Code:

xtabond2 ..., ... iv(D.hmcvi D.subnormal D.inadpf, eq(l)) xtdpdgmm ..., ... iv(hmcvi subnormal inadpf, m(l) diff)

If there are still remaining differences, try adding the mz suboption to the xtabond2 iv() options.

https://www.kripfganz.de/stata/
Comment

Eliana Melo

Join Date: Jan 2021
Posts: 9

#276

12 May 2021, 12:43

Thank so much for your help. I didn't consider that xtabond2 by default applies a first-difference transformation to the instruments in iv.

I have modified the command according to the second option you mentioned and I still have different results for both command. I also added mz suboption to the xtabond2. Regarding the level model, do you advise creating first-differenced instruments for the level model? I ran without the first-difference for the level model.

Xtabond2:

Code:

xtabond2 pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmm (L.pntbt, lag(2 2) eq(d) collapse) gmm (L.pntbt, lag(2 2) eq(l) collapse) gmm (dec_apu, lag(2 2) eq(d) collapse) gmm (dec_apu, lag(1 1) eq(l) collapse) iv(hmcvi subnormal inadpf, eq(d) mz) iv(hmcvi subnormal inadpf, eq(l) mz) twostep robust

Results:

Code:

 xtabond2 pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmm (L.pntbt, lag(2 2) eq(d) collapse) gmm (L.pntbt, lag(2 2) eq(l) collapse) g
> mm (dec_apu, lag(2 2) eq(d) collapse) gmm (dec_apu, lag(1 1) eq(l) collapse) iv(hmcvi subnormal inadpf, eq(d) mz) iv(hmcvi subnormal i
> nadpf, eq(l) mz) twostep robust
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: id                              Number of obs      =       423
Time variable : ano                             Number of groups   =        33
Number of instruments = 11                      Obs per group: min =         9
Wald chi2(5)  =   1161.71                                      avg =     12.82
Prob > chi2   =     0.000                                      max =        13
------------------------------------------------------------------------------
             |              Corrected
       pntbt |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       pntbt |
         L1. |   .7842838   .0743589    10.55   0.000     .6385431    .9300246
             |
       hmcvi |   .0003996   .0003474     1.15   0.250    -.0002814    .0010805
   subnormal |   .2959394   .1958883     1.51   0.131    -.0879946    .6798734
      inadpf |   .8593435   .2568274     3.35   0.001     .3559712    1.362716
     dec_apu |   .0012405   .0009055     1.37   0.171    -.0005343    .0030152
       _cons |  -.0543653   .0160513    -3.39   0.001    -.0858253   -.0229053
------------------------------------------------------------------------------
Instruments for first differences equation
  Standard
    D.(hmcvi subnormal inadpf), missing recoded as zero
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L2.dec_apu collapsed
    L2.L.pntbt collapsed
Instruments for levels equation
  Standard
    hmcvi subnormal inadpf, missing recoded as zero
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.dec_apu collapsed
    DL2.L.pntbt collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -2.44  Pr > z =  0.015
Arellano-Bond test for AR(2) in first differences: z =   1.56  Pr > z =  0.118
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(5)    =   5.62  Prob > chi2 =  0.345
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(5)    =   5.95  Prob > chi2 =  0.311
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(3)    =   2.39  Prob > chi2 =  0.496
    Difference (null H = exogenous): chi2(2)    =   3.56  Prob > chi2 =  0.168
  gmm(L.pntbt, collapse eq(diff) lag(2 2))
    Hansen test excluding group:     chi2(4)    =   3.05  Prob > chi2 =  0.550
    Difference (null H = exogenous): chi2(1)    =   2.90  Prob > chi2 =  0.089
  gmm(L.pntbt, collapse eq(level) lag(2 2))
    Hansen test excluding group:     chi2(4)    =   5.03  Prob > chi2 =  0.284
    Difference (null H = exogenous): chi2(1)    =   0.92  Prob > chi2 =  0.339
  gmm(dec_apu, collapse eq(diff) lag(2 2))
    Hansen test excluding group:     chi2(4)    =   2.64  Prob > chi2 =  0.619
    Difference (null H = exogenous): chi2(1)    =   3.30  Prob > chi2 =  0.069
  gmm(dec_apu, collapse eq(level) lag(1 1))
    Hansen test excluding group:     chi2(4)    =   4.48  Prob > chi2 =  0.345
    Difference (null H = exogenous): chi2(1)    =   1.47  Prob > chi2 =  0.226
  iv(hmcvi subnormal inadpf, mz eq(diff))
    Hansen test excluding group:     chi2(2)    =   3.01  Prob > chi2 =  0.222
    Difference (null H = exogenous): chi2(3)    =   2.94  Prob > chi2 =  0.400
  iv(hmcvi subnormal inadpf, mz eq(level))
    Hansen test excluding group:     chi2(2)    =   0.15  Prob > chi2 =  0.928
    Difference (null H = exogenous): chi2(3)    =   5.80  Prob > chi2 =  0.122

xtdpdgmm:

Code:

xtdpdgmm pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmmiv(L.pntbt, lag(1 1) m(d) collapse) gmmiv(L.pntbt, lag(1 1) m(l) collapse) gmmiv(dec_apu, lag(1 1) m(d) collapse) gmmiv(dec_apu, lag(0 0) m(l) collapse) iv(hmcvi subnormal inadpf, m(d) diff) iv(hmcvi subnormal inadpf, m(l)) twostep vce(r)

Results:

Code:

xtdpdgmm pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmmiv(L.pntbt, lag(1 1) m(d) collapse) gmmiv(L.pntbt, lag(1 1) m(l) collapse) g
> mmiv(dec_apu, lag(1 1) m(d) collapse) gmmiv(dec_apu, lag(0 0) m(l) collapse) iv(hmcvi subnormal inadpf, m(d) diff) iv(hmcvi subnormal
> inadpf, m(l)) twostep vce(r)

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =  .00043768
Step 2         f(b) =  .26082684

Group variable: id                           Number of obs         =       423
Time variable: ano                           Number of groups      =        33

Moment conditions:     linear =      11      Obs per group:    min =         9
                    nonlinear =       0                        avg =  12.81818
                        total =      11                        max =        13

                                    (Std. Err. adjusted for 33 clusters in id)
------------------------------------------------------------------------------
             |              WC-Robust
       pntbt |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       pntbt |
         L1. |   .9087191   .0147842    61.47   0.000     .8797425    .9376956
             |
       hmcvi |   .0001282   .0001377     0.93   0.352    -.0001418    .0003982
   subnormal |   .0723061   .0550229     1.31   0.189    -.0355368     .180149
      inadpf |   .5317917    .232069     2.29   0.022     .0769449    .9866385
     dec_apu |   .0006042   .0002017     3.00   0.003     .0002089    .0009995
       _cons |  -.0287216   .0094184    -3.05   0.002    -.0471813   -.0102618
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(diff):
   L1.L.pntbt
 2, model(level):
   L1.L.pntbt
 3, model(diff):
   L1.dec_apu
 4, model(level):
   dec_apu
 5, model(diff):
   D.hmcvi D.subnormal D.inadpf
 6, model(level):
   hmcvi subnormal inadpf
 7, model(level):
   _cons

I am a bit worried because with the xtdpdgmm command the lagged dependent variable is almost 1. There is a misspecification or omitted dynamics?

Thanks!

Last edited by Eliana Melo; 12 May 2021, 13:35.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#277

12 May 2021, 12:53

You are still using different lags in the two specifications. If you want to use the first lag of the lagged dependent variable in both specifications, then you need to modify the xtabond2 command line:

Code:

xtabond2 ..., gmm(L.pntbt, lag(1 1) eq(d) collapse) ... xtdpdgmm ..., gmmiv(L.pntbt, lag(1 1) m(d) collapse) ...

and similarly in all other gmm() options!

A large coefficent of the lagged dependent variable is not necessarily a problem. You could check for neglected dynamics by testing for serial correlation of the error term with

Code:

estat serial

after the xtdpdgmm command.

https://www.kripfganz.de/stata/
Comment

Eliana Melo

Join Date: Jan 2021
Posts: 9

#278

13 May 2021, 02:44

Prof. Kripfganz,

Thank so much for your help. I achieved the same results with both commands:

Xtabond2:

Code:

xtabond2 pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmm (L.pntbt, lag(2 2) eq(d) collapse) gmm (L.pntbt, lag(2 2) eq(l) collapse) gmm (dec_apu, lag(1 1) eq(d) collapse) gmm (dec_apu, lag(1 1) eq(l) collapse)
iv(hmcvi subnormal inadpf, eq(d) mz) iv(hmcvi subnormal inadpf, eq(l) mz) twostep robust

xtdpdgmm:

Code:

xtdpdgmm pntbt L.pntbt hmcvi subnormal inadpf dec_apu, gmmiv(L.pntbt, lag(2 2) m(d) collapse) gmmiv(L.pntbt, lag(2 2) m(l) diff collapse) gmmiv(dec_apu, lag(1 1) m(d) collapse) gmmiv(dec_apu, lag(1 1) m(l) diff collapse) iv(hmcvi subnormal inadpf, m(d) diff) iv(hmcvi subnormal inadpf, m(l)) twostep vce(r)

And I also tested the serial correlation of the term error:

Code:

estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =   -2.2844   Prob > |z|  =    0.0223
H0: no autocorrelation of order 2:     z =    1.5033   Prob > |z|  =    0.1328

Hansen test:

Code:

 estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix       chi2(5)     =    5.2986
                                                       Prob > chi2 =    0.3805

2-step moment functions, 3-step weighting matrix       chi2(5)     =    5.5599
                                                       Prob > chi2 =    0.3514

Then the results in the first test means the absence of the second-order serial correlation in disturbances. And the results with Hansen test means that there is not problem of overidentifying restrictions. Is it right?
Which is the most appropriate test of overindetifying restrictions to show in the results, 2-step weighting matrix or 3-step weighting matrix?

Once again, thank very much!

Last edited by Eliana Melo; 13 May 2021, 02:47.

Comment

haiyan lin

Join Date: Aug 2020

Posts: 34
#279

13 May 2021, 03:56

Dear Sebastian,

Hope you don't mind that I raise a question here. I was confusing about how to choose lags in the GMM regression.

If I have two regression using sys-GMM with different lags, say lag (0 7) and lag (4 6) [the maximum lag is 7] for the predetermined variables, both specification tests [AR(2), Hansen] imply the instruments are valid. And also the instruments are strong. Which result should I take in the end? In addition, if one estimation is significant, while the other one is not, which result should I take?

Many thanks.
Haiyan

Last edited by haiyan lin; 13 May 2021, 04:04.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#280

13 May 2021, 04:16

Originally posted by Eliana Melo View Post

Then the results in the first test means the absence of the second-order serial correlation in disturbances. And the results with Hansen test means that there is not problem of overidentifying restrictions. Is it right?
Which is the most appropriate test of overindetifying restrictions to show in the results, 2-step weighting matrix or 3-step weighting matrix?

Yes, your test results do not reject the model specification.
It is usually sufficient to consider the overidentification test with the 2-step weighting matrix. The two tests are asymptotically equivalent. If they differ substantially, then this would be an indication that the weighting matrix is poorly estimated. Here, they are very similar which is a good sign.

Originally posted by haiyan lin View Post

If I have two regression using sys-GMM with different lags, say lag (0 7) and lag (4 6) [the maximum lag is 7] for the predetermined variables, both specification tests [AR(2), Hansen] imply the instruments are valid. And also the instruments are strong. Which result should I take in the end? In addition, if one estimation is significant, while the other one is not, which result should I take?

If all lags are valid and strong instruments, then you could simply use all of them. Usually, larger lags tend to become weaker instruments. Thus, as long as they are valid, you should always keep the small lags.
Statistical significance of the coefficient estimates itself should not be a selection criterion here, unless the significance in one specification is a result of substantially smaller standard errors due to a more efficient estimation with better instruments.

https://www.kripfganz.de/stata/
1 like
Comment
haiyan lin

Join Date: Aug 2020

Posts: 34
#281

13 May 2021, 06:11

Great thanks for your advice!
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#282

16 May 2021, 07:52

I have an unbalanced dataset of 1696 companies over the period 2001-2016. I am running the following model:

Code:

xtdpdgmm Cash L.Cash Size Leverage Liquidity Profitability, teffects twostep vce(cluster CompanyID) gmmiv(L.Cash, lag(0 1) model(fodev)) gmmiv(Leverage Liquidity Profitability, lag(1 6) collapse model(fodev)) iv(Size, model(level)) nofootnote

1. When I run the above-mentioned model, the year dummies start from 2003 and go up till 2016. The dummies for initial two years i.e. 2001 and 2002 do not appear in the model. I would like to understand the specific reasons for omission of the dummies for 2001 and 2002.
2. I also observe that I cannot use upper limit of lag range greater than 14 for endogenous variables (else, I receive an error). What are the specific reasons due to which the lag range is restricted to T-2?

Thanks!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#283

17 May 2021, 05:10

The presence of the lagged dependent variable reduces the sample size effectively by 1 year. Another time dummy is dropped to avoid perfect collinearity of all the time dummies with the regression constant.

Again, T is effectively reduced by 1 due to the lagged dependent variable. Transforming the model into first differences or forward-orthogonal deviations reduces the effective sample size by another period, hence T-2.

https://www.kripfganz.de/stata/
1 like
Comment
Prateek Bedi

Join Date: Sep 2018

Posts: 199
#284

19 May 2021, 08:22

Originally posted by Sebastian Kripfganz View Post

The presence of the lagged dependent variable reduces the sample size effectively by 1 year. Another time dummy is dropped to avoid perfect collinearity of all the time dummies with the regression constant.

Again, T is effectively reduced by 1 due to the lagged dependent variable. Transforming the model into first differences or forward-orthogonal deviations reduces the effective sample size by another period, hence T-2.

Dear Prof. Sebastian:

Thanks one more time for your precise and insightful response. Following are my next queries

1. With regard to the upper limit of lag range for instruments, I just noticed that when we use first difference transformation (using model(diff)) option, we can still use a maximum value of T-1 in xtdpdgmm. This is perhaps because of the fact that presence of lagged dependent variables and first-difference transformation lead to the omission of the same year i.e. first year of the dataset. However, when we use forward orthogonal transformation (using model(fodev)), we are required to restrict our upper limit of lag range to T-2 on account of presence of lagged dependent variable (which leads to omission of the first year of dataset) and loss of the last year of the dataset. Please confirm if this understanding is correct.
2. Also, I noticed that when I do not include lagged dependent variables in my model and use level transformation throughout the model, I am still not able to increase my upper limit of lag range beyond T-1. However, I should be able to use the entire set of periods as instruments in such a model. Isn't? Below is the sample model I am talking about (my dataset is :

Code:

xtdpdgmm CashHoldings1 Size1 Leverage1 Liquidity1 Profitability4, twostep vce(cluster CompanyID) gmmiv(Leverage1 Liquidity1 Profitability4, lag(1 16) collapse model(l)) iv(Size1, model(level)) nofootnote

When I run the abovementioned command, I get the following error:

Code:

lagrange () invalid -- invalid numlist has elements outside of allowed range

Thanks
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2609
#285

20 May 2021, 03:59

I double-checked the code again. The program actually determines the maximum admissible lag order for the lag() option as T-1, where T is the maximum time length across all groups. This is irrespective of whether you include a lagged dependent variable or estimate the model in first differences or forward-orthogonal deviations. The effective maximum lag order could be smaller than T-1, which you can observe from the list of instruments below the output table (without the nofootnote option).

https://www.kripfganz.de/stata/
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment