XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#256

27 Feb 2021, 07:22

1. The condition T>=2 refers to a model where the initial observation is observed for period 0, i.e. effectively you need at least 3 time periods when the first-differenced lagged dependent variable is instrumented with the second lag of the dependent variable in levels.

2. Almost everything you can do with xtabond2, you can also do with xtdpdgmm. Instrumenting for endogenous variables with the latter command works in a very similar way. Please see the help file or my 2019 London Stata Conference presentation:
Kripfganz, S. (2019). Generalized method of moments estimation of linear dynamic panel data models. Proceedings of the 2019 London Stata Conference.

3. With a binary dependent variable, you can still estimate a linear regression model. This is then labelled a linear probability model. Again, no difference between xtabond2 and xtdpdgmm here.

https://www.kripfganz.de/stata/
Comment
Dao DinhNguyen

Join Date: Feb 2021

Posts: 13
#257

27 Feb 2021, 07:36

Thank you so much for your reply.
Comment
Tiyo Ardiyono

Join Date: Mar 2021

Posts: 8
#258

15 Mar 2021, 21:57

Dear Sebastian,

When working with a new computer and a new Stata installation (16.1), I found an error when running code:

Code:

estat mmsc model1 model2

There is an error:

ngroups1 not found

I usually use Stata in my own laptop, and the code works just fine.
What do you think the problem in here?

Thank you.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#259

16 Mar 2021, 04:37

Thanks for flagging this bug. There was unfortunately a silly mistake I did in my last update.

The updated version 2.3.3 that fixes this problem is now available on my website:

Code:

adoupdate xtdpdgmm, update

https://www.kripfganz.de/stata/
Comment

Tiyo Ardiyono

Join Date: Mar 2021
Posts: 8

#260

17 Mar 2021, 20:19

Dear Sebastian,

Thank you for the update.

Now my problem is the difference result from using XTABOND2 and XTDPDGMM. So, I compare the commands:

Code:

global var1 ="hf hfhf lgdp lgdp2 L(0/1).(hflgdp)"

xtabond2 L(0/1).ls_fdi $var1 yr2004-yr2015, gmm(ls_fdi, lag(2 3) coll) gmm(hf, lag(2 5) coll) gmm(hfhf, lag(2 5) coll) gmm(hflgdp, lag(2 5) coll) gmm(lgdp, lag(2 5) coll) gmm(lgdp2, lag(2 5) coll) iv(yr2004-yr2015) artest(10) noleveleq twostep svmat robust

xtdpdgmm L(0/1).ls_fdi $var1 yr2004-yr2015, gmm(ls_fdi, lag(2 3) coll) gmm(hf, lag(2 5) coll) gmm(hfhf, lag(2 5) coll) gmm(hflgdp, lag(2 5) coll) gmm(lgdp, lag(2 5) coll) gmm(lgdp2, lag(2 5) coll) iv(yr2004-yr2015) model(diff) twostep overid vce(robust)

The estimates for the endogenous variables are the same (both coefficients and the standard error), but there are differences in (i) the estimates for year dummies and (ii) the AR1, AR2, and so on. I'd say I can ignore the difference in (i) but the difference in (ii) is significant.
xtabond2 produces

Code:

------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -4.09  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -0.16  Pr > z =  0.871
Arellano-Bond test for AR(3) in first differences: z =   0.49  Pr > z =  0.625
Arellano-Bond test for AR(4) in first differences: z =   0.17  Pr > z =  0.868
Arellano-Bond test for AR(5) in first differences: z =   0.09  Pr > z =  0.927
Arellano-Bond test for AR(6) in first differences: z =   0.50  Pr > z =  0.616
Arellano-Bond test for AR(7) in first differences: z =  -0.17  Pr > z =  0.862
Arellano-Bond test for AR(8) in first differences: z =   0.09  Pr > z =  0.925
Arellano-Bond test for AR(9) in first differences: z =  -0.31  Pr > z =  0.755
Arellano-Bond test for AR(10) in first differences:z =   0.28  Pr > z =  0.776
------------------------------------------------------------------------------

XTDPDGMM produces

Code:

Arellano-Bond test for    autocorrelation    of the first-differenced residuals
H0: no autocorrelation    of order 1:    z =   -0.0179   Prob > z  =    0.9857
H0: no autocorrelation    of order 2:    z =   -0.0023   Prob > z  =    0.9982
H0: no autocorrelation    of order 3:    z =    0.0046   Prob > z  =    0.9964
H0: no autocorrelation    of order 4:    z =    0.0064   Prob > z  =    0.9949
H0: no autocorrelation    of order 5:    z =    0.0011   Prob > z  =    0.9991
H0: no autocorrelation    of order 6:    z =    0.0090   Prob > z  =    0.9928
H0: no autocorrelation    of order 7:    z =         .   Prob > z  =         .
H0: no autocorrelation    of order 8:    z =         .   Prob > z  =         .
H0: no autocorrelation    of order 9:    z =         .   Prob > z  =         .
H0: no autocorrelation    of order 10:    z =    0.0029   Prob > z  =    0.9977

What does the cause of the difference and how can I get the same result using both xtabond2 and xtdpdgmm? The Hansen in both commands are the same though.

Last edited by Tiyo Ardiyono; 17 Mar 2021, 21:04.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#261

18 Mar 2021, 04:54

(i) The differences in the coefficients of the year dummies are possibly due to the different instrumentation. Note that the iv() option with xtabond2 automatically first-differences the year dummy instruments, while the same option with xtdpdgmm would require the suboption diff to do the first-differencing of the instruments. The coefficients might also differ if xtabond2 drops a different year dummy due to collinearity than xtdpdgmm does.

(ii) The differences in the year dummy coefficients could possibly contribute to the differences in the test results. More importantly, it is actually a feature of xtdpdgmm that it produces different AR test results after the two-step estimator with the Windmeijer correction. Let me quote myself from the opening post of this thread:

Originally posted by Sebastian Kripfganz View Post

The results of the Arellano-Bond test differ slightly from xtdpd and xtabond2 for two-step robust estimators because I account for the finite-sample Windmeijer (2005) correction when computing the test statistic, while the existing commands do not.

You should get the same test results with the one-step estimator, or with the two-step estimator but without the robust/vce(robust) options.

https://www.kripfganz.de/stata/
Comment
John Sgr

Join Date: Sep 2020

Posts: 28
#262

19 Mar 2021, 04:33

Dear Sebastian,

What is the best way to judge p-values of Hansen J-test of the overidentifying restrictions? I found p value=0.15, how should I interpret this?

Best regards,
John
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#263

19 Mar 2021, 04:56

If all overidentifying restrictions are indeed valid, in repeated random samples you would expect to see a more extreme value of the Hansen test in 15% of the cases, provided that the Hansen test is correctly sized.

The latter qualification can be an issue with dynamic panel data models, in particular if you have a relatively small sample size and many instruments, or if some of the instruments are weak. There is no consensus on what constitutes a good range of p-values that provides us with sufficient confidence in the correct model specification. See for example the following article:
Kiviet, J. F. (2020). Microeconometric dynamic panel data methods: Model specification and selection issues. Econometrics and Statistics 13, 16-45.

https://www.kripfganz.de/stata/
Comment
John Sgr

Join Date: Sep 2020

Posts: 28
#264

19 Mar 2021, 05:36

I see, this paper says that the perceived power of the test matters to judge p value of 0.15 for the Hansen test. If I revised my model to get higher p-value, which interval would provide at least safe zone?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#265

19 Mar 2021, 05:54

I cannot say much beyond what Jan Kiviet writes in his paper, and there is no safe zone that I feel confident laying out here. Eventually, it remains a matter of judgment depending on the particular data and application.

https://www.kripfganz.de/stata/
Comment

Chhavi Jatana

Join Date: Apr 2021
Posts: 4

#266

09 Apr 2021, 09:35

Dear sir,

I want to apply the two-step system GMM to investigate the impact of ownership concentration on the CEO pay-performance relationship with 201 firms for 5 years of balanced panel data. I have applied the command given below. DV is TC; IDVs and control variables are ROEP T3 LFSIZE LFAGE LEV RISK CEOD BSZE IND_P; ROET3 is the interaction variable; ID* are 5 industry dummy variables and YD* are 4 year dummy variables

The results are not up to the mark- the p-value of the Hansen and Sargan test is very high; AR(1) and AR (2) both are insignificant and none of the coefficients are significant. I made some changes to the command like adding collapse to the equation to reduce number of instruments, changed the classification of variables from endogenous to exogenous but none worked.

Please suggest what can be done to meet all the assumptions along with retaining the significance of the coefficients.

Code:

xtdpdgmm TC L.TC ROEP ROET3 T3 LFSIZE LFAGE LEV RISK CEOD BSZE IND_P ID* YD*,twostep vce(cluster cid) gmmiv (L.TC, lag(0 0) collapse model (fodev)) gmmiv (ROEP, lag(0 1) collapse model (fodev)) gmmiv (ROET3, lag(0 1) collapse model (fodev)) gmmiv (T3, lag(0 1) collapse model (fodev)) gmmiv (LFSIZE, lag(0 1) collapse model (fodev)) gmmiv (LFAGE, lag(0 1) collapse model (fodev)) gmmiv (LEV, lag(0 1) collapse model (fodev)) gmmiv (RISK, lag(0 1) collapse model (fodev)) gmmiv (BSZE, lag(0 1) collapse model (fodev)) gmmiv (IND_P, lag(0 1) collapse model (fodev)) gmmiv (CEOD, lag(0 1) collapse model (fodev)) gmmiv (ID*, lag(0 0) collapse model (level)) gmmiv (YD*, lag(0 0) collapse model (level)) nofootnote
Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =   380.3873
Step 2         f(b) =  .02331842

Group variable: cid                          Number of obs         =       804
Time variable: YEAR                          Number of groups      =       201

Moment conditions:     linear =      30      Obs per group:    min =         4
                    nonlinear =       0                        avg =         4
                        total =      30                        max =         4

                                  (Std. Err. adjusted for 201 clusters in cid)
------------------------------------------------------------------------------
             |              WC-Robust
          TC |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          TC |
         L1. |   .0704489   .2947275     0.24   0.811    -.5072064    .6481041
             |
        ROEP |  -2.161121   7.234564    -0.30   0.765    -16.34061    12.01836
       ROET3 |   .0486878   .1103199     0.44   0.659    -.1675352    .2649108
          T3 |  -1.268114   3.028676    -0.42   0.675    -7.204211    4.667982
      LFSIZE |   -17.6046   79.81283    -0.22   0.825    -174.0349    138.8257
       LFAGE |   121.1339   126.2982     0.96   0.338     -126.406    368.6738
         LEV |  -10.47428   151.6317    -0.07   0.945    -307.6669    286.7183
        RISK |  -25.74973   86.25241    -0.30   0.765    -194.8013    143.3019
        CEOD |  -70.64974   109.8793    -0.64   0.520    -286.0091    144.7096
        BSZE |  -1.578545   3.629034    -0.43   0.664    -8.691321    5.534232
       IND_P |  -1.513525    1.22047    -1.24   0.215    -3.905602    .8785514
         ID1 |   18.11663   97.57612     0.19   0.853     -173.129    209.3623
         ID2 |   7.156462   65.52682     0.11   0.913    -121.2737    135.5867
         ID3 |   16.69424   106.4915     0.16   0.875    -192.0252    225.4136
         ID4 |   -28.0001   83.33812    -0.34   0.737    -191.3398    135.3396
         ID5 |   42.28515   99.73981     0.42   0.672    -153.2013    237.7716
         YD1 |  -14.67081   27.28439    -0.54   0.591    -68.14724    38.80561
         YD2 |  -10.77153   18.99932    -0.57   0.571    -48.00952    26.46647
         YD3 |  -7.408057    9.57917    -0.77   0.439    -26.18288    11.36677
         YD4 |          0  (omitted)
       _cons |   11.46227   760.6348     0.02   0.988    -1479.355    1502.279
------------------------------------------------------------------------------

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix       chi2(10)    =    4.6870
                                                       Prob > chi2 =    0.9111

2-step moment functions, 3-step weighting matrix       chi2(10)    =    6.3643
                                                       Prob > chi2 =    0.7838

. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1:     z =   -0.3265   Prob > |z|  =    0.7440
H0: no autocorrelation of order 2:     z =   -0.3837   Prob > |z|  =    0.7012

Thanks in advance!

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#267

09 Apr 2021, 09:56

I am afraid I do not have a good answer. I notice that your standard errors are all very large which might be a consequence of weak instruments. You could try adding nonlinear moment conditions, e.g. option nl(noserial), although I am not sure if that will improve the situation.

https://www.kripfganz.de/stata/
Comment

Chhavi Jatana

Join Date: Apr 2021
Posts: 4

#268

09 Apr 2021, 10:34

Dear Sir,

nl(noserial) option is not working

Code:

xtdpdgmm TC L.TC ROEP ROET3 T3 LFSIZE LFAGE LEV RISK CEOD BSZE IND_P ID* YD*,twostep vce(cluster cid) nl(noserial) gmmiv (L
> .TC, lag(0 0) collapse model (fodev)) gmmiv (ROEP, lag(0 1) collapse model (fodev)) gmmiv (ROET3, lag(0 1) collapse model (
> fodev)) gmmiv (T3, lag(0 1) collapse model (fodev)) gmmiv (LFSIZE, lag(0 1) collapse model (fodev)) gmmiv (LFAGE, lag(0 1)
> collapse model (fodev)) gmmiv (LEV, lag(0 1) collapse model (fodev)) gmmiv (RISK, lag(0 1) collapse model (fodev)) gmmiv (B
> SZE, lag(0 1) collapse model (fodev)) gmmiv (IND_P, lag(0 1) collapse model (fodev)) gmmiv (CEOD, lag(0 1) collapse model (
> fodev)) gmmiv (ID*, lag(0 0) collapse model (level))gmmiv (YD*, lag(0 0) collapse model (level)) nofootnote

Generalized method of moments estimation

Fitting full model:

Step 1:
initial:       f(b) =   18948478
alternative:   f(b) =   18911537
rescale:       f(b) =  7130192.3
Iteration 0:   f(b) =  7130192.3  (not concave)
Iteration 1:   f(b) =  949992.37  (not concave)
Iteration 2:   f(b) =  278578.86  (not concave)
Iteration 3:   f(b) =  151619.85  (not concave)
Iteration 4:   f(b) =  120342.38  (not concave)
Iteration 5:   f(b) =  97568.645  (not concave)
Iteration 6:   f(b) =  81329.831  (not concave)
Iteration 7:   f(b) =  70476.142  (not concave)
Iteration 8:   f(b) =  59107.138  (not concave)
Iteration 9:   f(b) =  52308.153  (not concave)
Iteration 10:  f(b) =  43053.096  (not concave)
Iteration 11:  f(b) =  33223.071  (not concave)
Iteration 12:  f(b) =  29835.377  (not concave)
Iteration 13:  f(b) =  16909.705  (not concave)
Iteration 14:  f(b) =  15002.974  (not concave)
Iteration 15:  f(b) =  14232.683  (not concave)
Iteration 16:  f(b) =  7718.8442  (not concave)
Iteration 17:  f(b) =  7532.1743  (not concave)
Iteration 18:  f(b) =  7366.5578  (not concave)
Iteration 19:  f(b) =  7217.5852  (not concave)
Iteration 20:  f(b) =  7078.5422  (not concave)
Iteration 21:  f(b) =  6947.9369  (not concave)
Iteration 22:  f(b) =  6826.4571  (not concave)
Iteration 23:  f(b) =  6711.9993  (not concave)
Iteration 24:  f(b) =  6604.6715  (not concave)
Iteration 25:  f(b) =  6503.2638  (not concave)
Iteration 26:  f(b) =  6408.0337  (not concave)
Iteration 27:  f(b) =  6317.8641  (not concave)
Iteration 28:  f(b) =  6232.9867  (not concave)
Iteration 29:  f(b) =  6152.4831  (not concave)
Iteration 30:  f(b) =  6076.5509  (not concave)
Iteration 31:  f(b) =  6004.4055  (not concave)
Iteration 32:  f(b) =  5936.2248  (not concave)
Iteration 33:  f(b) =  5871.3394  (not concave)
Iteration 34:  f(b) =  5809.9042  (not concave)
Iteration 35:  f(b) =  5751.3458  (not concave)
Iteration 36:  f(b) =  5695.8011  (not concave)
Iteration 37:  f(b) =  5642.7766  (not concave)
Iteration 38:  f(b) =   5592.393  (not concave)
Iteration 39:  f(b) =  5544.2241  (not concave)
Iteration 40:  f(b) =  5498.3769  (not concave)
Iteration 41:  f(b) =  5454.4817  (not concave)
Iteration 42:  f(b) =  5412.6337  (not concave)
Iteration 43:  f(b) =  5372.5111  (not concave)
Iteration 44:  f(b) =  5334.1987  (not concave)
Iteration 45:  f(b) =  5297.4154  (not concave)
Iteration 46:  f(b) =  5262.2372  (not concave)
Iteration 47:  f(b) =  5228.4177  (not concave)
Iteration 48:  f(b) =  5196.0252  (not concave)
Iteration 49:  f(b) =  5164.8429  (not concave)
Iteration 50:  f(b) =  5134.9323  (not concave)
Iteration 51:  f(b) =  5106.1022  (not concave)
Iteration 52:  f(b) =  5078.4083  (not concave)
Iteration 53:  f(b) =  5051.6812  (not concave)
Iteration 54:  f(b) =  5025.9715  (not concave)
Iteration 55:  f(b) =  5001.1288  (not concave)
Iteration 56:  f(b) =  4977.1991  (not concave)
Iteration 57:  f(b) =  4954.0487  (not concave)
Iteration 58:  f(b) =  4931.7194  (not concave)
Iteration 59:  f(b) =  4910.0918  (not concave)
Iteration 60:  f(b) =  4889.2043  (not concave)
Iteration 61:  f(b) =  4868.9499  (not concave)
Iteration 62:  f(b) =  4849.3638  (not concave)
Iteration 63:  f(b) =  4830.3502  (not concave)
Iteration 64:  f(b) =  4811.9413  (not concave)
Iteration 65:  f(b) =  4794.0509  (not concave)
Iteration 66:  f(b) =  4776.7087  (not concave)
Iteration 67:  f(b) =   4759.837  (not concave)
Iteration 68:  f(b) =  4743.4633  (not concave)
Iteration 69:  f(b) =  4727.5172  (not concave)
Iteration 70:  f(b) =  4712.0243  (not concave)
Iteration 71:  f(b) =  4696.9209  (not concave)
Iteration 72:  f(b) =  4682.2305  (not concave)
--Break--

These are the results from the Random effects model applied and most of the variables are significant. Only after applying system GMM, I am getting insignificant results. Can you please suggest some solution on the basis of these results?

Code:

xtreg TC ROEP ROET3 T3 LFSIZE LFAGE LEV RISK CEOD BSZE IND_P ID1 ID2 ID3 ID4 ID5 YD1 YD2 YD3 YD4, re vce (cluster cid)

Random-effects GLS regression                   Number of obs     =      1,005
Group variable: cid                             Number of groups  =        201

R-sq:                                           Obs per group:
     within  = 0.2511                                         min =          5
     between = 0.4991                                         avg =        5.0
     overall = 0.4145                                         max =          5

                                                Wald chi2(19)     =     110.21
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

                                  (Std. Err. adjusted for 201 clusters in cid)
------------------------------------------------------------------------------
             |               Robust
          TC |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        ROEP |   11.81289   3.695961     3.20   0.001     4.568942    19.05684
       ROET3 |  -.1636423   .0533337    -3.07   0.002    -.2681745   -.0591101
          T3 |   .9268638   .5153631     1.80   0.072    -.0832294    1.936957
      LFSIZE |   21.30412   3.564797     5.98   0.000     14.31725    28.29099
       LFAGE |  -11.82151   7.810607    -1.51   0.130    -27.13002    3.486997
         LEV |  -20.22973   20.30682    -1.00   0.319    -60.03036     19.5709
        RISK |    1.05389   13.74748     0.08   0.939    -25.89068    27.99846
        CEOD |   23.02698   11.60022     1.99   0.047     .2909595      45.763
        BSZE |   3.099498   1.382514     2.24   0.025     .3898211    5.809175
       IND_P |   .9007423   .5228959     1.72   0.085    -.1241148    1.925599
         ID1 |  -47.37584   32.43792    -1.46   0.144     -110.953    16.20132
         ID2 |   3.698717   13.37388     0.28   0.782    -22.51361    29.91104
         ID3 |  -12.53249   14.97791    -0.84   0.403    -41.88865    16.82367
         ID4 |   2.917077   18.43619     0.16   0.874    -33.21719    39.05134
         ID5 |   30.07731   19.60296     1.53   0.125    -8.343787     68.4984
         YD1 |   1.671931   4.416635     0.38   0.705    -6.984515    10.32838
         YD2 |     1.6644   6.062894     0.27   0.784    -10.21865    13.54745
         YD3 |   4.077982    5.29307     0.77   0.441    -6.296245    14.45221
         YD4 |   15.10113   6.991724     2.16   0.031     1.397606    28.80466
       _cons |   -273.697   74.47474    -3.68   0.000    -419.6648   -127.7292
-------------+----------------------------------------------------------------
     sigma_u |  53.650479
     sigma_e |  56.558824
         rho |  .47362907   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2590
#269

09 Apr 2021, 10:47

The reason for the non-convergence with the nl(noserial) option is that the perfect collinearity among your time dummies. You need to manually drop one of those time dummies.

A random-effects (or fixed-effects) regression makes much stronger assumptions that effectively lead to much stronger instruments. In particular, all variables are assumed to be strictly exogenous.

You could start with a dynamic model that assumes all variables (other than the lagged dependent variable) being strictly exogenous and then relax this assumption for one variable after the other to see whether a particular variable is causing the trouble. I.e. start with the following specification:

Code:

xtdpdgmm TC L.TC ROEP ROET3 T3 LFSIZE LFAGE LEV RISK CEOD BSZE IND_P ID* YD2 YD3 YD4, twostep vce(cluster cid) collapse gmmiv(L.TC, lag(0 0) model(fodev)) gmmiv(ROEP ROET3 T3 LFSIZE LFAGE LEV RISK CEOD BSZE IND_P, lag(0 1) model (fodev)) gmmiv(ROEP ROET3 T3 LFSIZE LFAGE LEV RISK CEOD BSZE IND_P, lag(0 0) model(mdev)) iv(ID* YD2 YD3 YD4, model (level)) nofooter

The part in red are the extra instruments valid only under strict exogeneity.

https://www.kripfganz.de/stata/
Comment
Chhavi Jatana

Join Date: Apr 2021

Posts: 4
#270

09 Apr 2021, 11:05

First of all, thank you for prompt response.

Sir, I have five years study period, so I have formed only four dummy variables (first year is taken as a base year) based on n-1 formula. Do I still need to drop one of the year dummies?

I am a beginner and facing a lot of problems while applying the system GMM for my thesis, so might ask some silly questions. Apologies in advance. I will follow your advice on the dynamic model. I hope it helps.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment