Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

    Dear Statalisters,

    I have made a new estimation command available for installation from my website:
    Code:
    . net install xtdpdgmm, from(http://www.kripfganz.de/stata/)
    xtdpdgmm estimates a linear (dynamic) panel data model with the generalized method of moments (GMM). The main value added of the new command is that is allows to combine the traditional linear moment conditions with the nonlinear moment conditions suggested by Ahn and Schmidt (1995) under the assumption of serially uncorrelated idiosyncratic errors. These additional nonlinear moment conditions can yield potentially sizeable efficiency gains and they also improve the finite-sample performance. Given that absence of serial correlation is usually a prerequisite also for other GMM estimators in the presence of a lagged dependent variable, the gains from the nonlinear moment conditions essentially come for free.

    The extra moment conditions can help to overcome a weak instruments problem of the Arellano and Bond (1991) difference-GMM estimator when the autoregressive coefficient approaches unity. Furthermore, the Ahn and Schmidt (1995) estimator is also robust to deviations from mean stationarity, a situation that would invalidate the Blundell and Bond (1998) system-GMM approach.

    Without these nonlinear moment conditions, xtdpdgmm replicates the results obtained with the familiar commands xtabond, xtdpd, xtdpdsys, and xtabond2, as well as my other recent command xtseqreg. Collapsing of GMM-type instruments and different initial weighting matrices are supported. The key option of xtdpdgmm that adds the nonlinear moment conditions is called noserial. For example:
    Code:
    . webuse abdata
    
    . xtdpdgmm L(0/1).n w k, noserial gmmiv(L.n, collapse model(difference)) iv(w k, difference model(difference)) twostep vce(robust)
    
    Generalized method of moments estimation
    
    Step 1
    initial:       f(p) =  6.9508498
    alternative:   f(p) =   1.917675
    rescale:       f(p) =  .07590133
    Iteration 0:   f(p) =  .07590133  
    Iteration 1:   f(p) =    .003352  
    Iteration 2:   f(p) =  .00274414  
    Iteration 3:   f(p) =  .00274388  
    Iteration 4:   f(p) =  .00274388  
    
    Step 2
    Iteration 0:   f(p) =  .26774896  
    Iteration 1:   f(p) =  .20397319  
    Iteration 2:   f(p) =   .2011295  
    Iteration 3:   f(p) =  .20109259  
    Iteration 4:   f(p) =  .20109124  
    Iteration 5:   f(p) =   .2010912  
    
    Group variable: id                           Number of obs         =       891
    Time variable: year                          Number of groups      =       140
    
    Moment conditions:     linear =      10      Obs per group:    min =         6
                        nonlinear =       6                        avg =  6.364286
                            total =      16                        max =         8
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |              WC-Robust
               n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               n |
             L1. |    .657292   .1381388     4.76   0.000     .3865449    .9280391
                 |
               w |  -.7248798   .0996565    -7.27   0.000    -.9202029   -.5295568
               k |   .2399022   .0737048     3.25   0.001     .0954435    .3843609
           _cons |   2.719216   .4015915     6.77   0.000     1.932111    3.506321
    ------------------------------------------------------------------------------
    The Gauss-Newton technique is used to minimize the GMM criterion function. With vce(robust), the Windmeijer (2005) finite-sample standard error correction is computed for estimators with and without nonlinear moment conditions.

    For details about the syntax, the available options, and the supported postestimation commands, please see the help files:
    Code:
    . help xtdpdgmm
    . help xtdpdgmm postestimation
    Available postestimation command include the Arellano-Bond test for absence of serial correlation in the first-differenced errors, estat serial, and the familiar Hansen J-test of the overidentifying restrictions, estat overid. The results of the Arellano-Bond test differ slightly from xtdpd and xtabond2 for two-step robust estimators because I account for the finite-sample Windmeijer (2005) correction when computing the test statistic, while the existing commands do not. estat overid can also be used to perform difference-in-Hansen tests but it requires that the two models are estimated separately. In that regard, the results differ from the difference-in-Hansen test statistics reported by xtabond2; see footnote 24 in Roodman (2009) for an explanation. An alternative to difference-in-Hansen tests is a generalized Hausman test, implemented in estat hausman for use after xtdpdgmm.

    Finally, the results with and without nonlinear moment conditions can in principle also be obtained with Stata's official gmm command. However, it is anything but straightforward to do so. While the official gmm command offers lots of extra flexibility, it does not provide a tailored solution for this particular estimation problem. While xtdpdgmm can easily handle unbalanced panel data, gmm tends to have some problems in that case. In addition, gmm tends to be very slow in particular with large data sets. I did not do a sophisticated benchmark comparison, but for a single estimation on a data set with 40,000 observations, it took me 43 minutes (!) to obtain the results with gmm, while xtdpdgmm returned the identical results after just 4 seconds!

    I hope you enjoy the new command. As always, comments and suggestions are highly welcome, and an appropriate reference would be very much appreciated if my command proves to be helpful for your own research.

    References:
    • Ahn, S. C., and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Econometrics 68: 5-27.
    • Arellano, M., and S. R. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58: 277-297.
    • Blundell, R., and S. R. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Review of Economic Studies 87: 115-143.
    • Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal 9: 86-136.
    • Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step GMM estimators. Journal of Econometrics 126: 25-51.
    Last edited by Sebastian Kripfganz; 01 Jun 2017, 06:15.

  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz

    As a follow up, I ran the "weakiv" test after ivreg2 (ssc install weakiv) and obtained the diagnostics below for the same model. Can I conclude that the instruments are strong enough despite the low magnitude of the Weak identification test statistics that come by default in ivreg2?

    HTML Code:
    ----------------------------------------
     Test |       Statistic         p-value
    ------+---------------------------------
      CLR | stat(.)   =   137.16     0.0000
        K | chi2(32)  =    99.89     0.0000
        J | chi2(13)  =    42.29     0.0000
      K-J |        <n.a.>            0.0000
       AR | chi2(44)  =   142.18     0.0000
    ------+---------------------------------
     Wald | chi2(32)  =   146.58     0.0000
    ----------------------------------------

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz


    I have a quick question regarding a difference gmm model. The output below comes after successfully reproducing the results for the difference gmm model in xtdpdgmm with xtivreg2 in order to access the instrument diagnostics available for the latter. In general, the diagnostics look fine. The Arellano-Bond autocorrelation test of the residuals look fine as well-- statistically significant ar(1) but statistically insignificant for higher-order autocorrelation in residuals. However, both statistics for the Weak identification test look quite low in magnitude and to complicate things, the "Stock-Yogo weak ID test critical values" are <not available>. My questions are:

    1) Is this a matter for concern given the low values of the statistics for the Weak identification test?
    2) Is there anything to be done to obtain valid "Stock-Yogo weak ID test critical values"?
    3) Do you find these diagnostics concerning?
    4) Is there anything to be done at all?

    Thank you in advance!

    HTML Code:
    Underidentification test (Kleibergen-Paap rk LM statistic):             98.401
                                                       Chi-sq(14) P-val =   0.0000
    ------------------------------------------------------------------------------
    Weak identification test (Cragg-Donald Wald F statistic):                1.345
                             (Kleibergen-Paap rk Wald F statistic):          1.879
    Stock-Yogo weak ID test critical values:                       <not available>
    ------------------------------------------------------------------------------
    Hansen J statistic (overidentification test of all instruments):        15.589
                                                       Chi-sq(12) P-val =   0.1780
    -endog- option:
    Endogeneity test of endogenous regressors:                              17.543
                                                       Chi-sq(3) P-val =    0.0004
    Last edited by Arkangel Cordero; 09 Mar 2024, 17:30.

    Leave a comment:


  • Ismail Boujnane
    replied
    Dear sebastian, I have some cross-sectional (categorical) data collected from a questionnaire in 2021, which are integrated into a longitudinal dataset collected at different points in time for a period of 6 years, from 2015 to 2020. Knowing that the sample is the same for both data collection method, and my categorical data (institutional support, corporate governance) are dynamic, not static, I want to know if integrating them into my panel data is feasible.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    A new update is available for xtdpdgmm on my personal website. Version 2.6.6 fixes a few bugs in the postestimation command estat serialpm.

    Code:
    net install xtdpdgmm, from(https://www.kripfganz.de/stata) replace

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The lags for those instrument that refer to the first-differenced model should be the same for the two estimators; otherwise the results become less easy to compare.

    Leave a comment:


  • Sarah Magd
    replied
    Dear Prof. Sebastian Kripfganz

    Thanks for your constructive replies.

    Does the specification of the system GMM have to be the same as the specification of the Diff-GMM? For example, if we use lags(1 3) in the system GMM, do we have to specify the same range of lags in the Diff-GMM? or can the two estimators have different specifications for the range of lags?





    Leave a comment:


  • Sebastian Kripfganz
    replied
    1. N=28 is still small; therefore, my previous comments still apply.
    2. Yes, you can (and probably should) use a diff-GMM estimator as a robustness check (again, preferably one-step only).

    Leave a comment:


  • Sarah Magd
    replied
    Dear Prof. Sebastian Kripfganz

    Thanks for your constructive replies.

    1. Are there any issues if we restrict our sample to 28 countries and 13 years? We use a one-step system GMM estimator to estimate our model with this sample. Could you please let us know if we still have any issues with this setup?
    2. Given this sample, can we use the diff-GMM for robustness checks? or would you recommend another estimator for robustness?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1. I would call this a small N, moderately small T sample. You probably do not need to be concerned much with asymptotic efficiency; it might thus be a good idea to use the one-step insted of the two-step estimator, to avoid estimating the weighting matrix. Also, use the available options (collapsing and lag restrictions) to limit the number of instruments. You could still use the system GMM estimator if you can theoretically justify its assumptions. With such a data set, testing these assumptions empirically is challenging and probably not very reliable.
    2. From the outset, we do not know what the true value of the coefficient of the lagged dependent variable is; that is why we are estimating it. There can be different reasons for the observed differences: (i) sampling variability due to the small data set; (ii) endogeneity of the lagged dependent variable (due to neglected serial correlation in the error term) such that the model treating it as predetermined is misspecified; (iii) weak instruments when treating the lagged dependent variable as endogenous, to name a few.

    Leave a comment:


  • Sarah Magd
    replied
    Dear Prof. Sebastian Kripfganz

    1) Can we use the sys-gmm with a sample that has 28 countries and 20 years? Is this considered a big T or can we still use the sys-GMM?
    2) When we define the lagged dependent variable as a predetermined variable, the estimated coefficient of this variable is 0.542. However, when we specify the variable as an endogenous, its magnitude becomes .745. Does the magnitude of the lagged dependent variable have to be close to 1?

    Could you please guide us on these two points.


    Thanks

    Leave a comment:


  • Tugrul Cinar
    replied
    What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.
    Thank you very much for the quick response.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    What you are describing is a data set with repeated cross sections. xtdpdgmm requires the data to be declared as panel data; in particular, a panel identifier variable needs to be declared with xtreg. This may not be possible with the type of data you have.

    Leave a comment:


  • Tugrul Cinar
    replied
    Dear Sebastian,

    I am going to use a Micro dataset for an upcoming study. However, this dataset consists of random samples for each year, Essentially, it's a pooled dataset rather than panel data. Moreover, I suspect an endogeneity issue between the dependent and independent variables in the model I'm aiming to estimate. Additionally, the dataset encompasses roughly 100,000 units per year, spanning across seven years.

    Given that the xtdpdgmm command is designed for linear (dynamic) panel data, do you recommend it for analyzing a pooled dataset?

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz


    Understood. This is helpful. Thank you!

    Leave a comment:

Working...
X