Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • XTDPDGMM: new Stata command for efficient GMM estimation of linear (dynamic) panel models with nonlinear moment conditions

    Dear Statalisters,

    I have made a new estimation command available for installation from my website:
    Code:
    . net install xtdpdgmm, from(http://www.kripfganz.de/stata/)
    xtdpdgmm estimates a linear (dynamic) panel data model with the generalized method of moments (GMM). The main value added of the new command is that is allows to combine the traditional linear moment conditions with the nonlinear moment conditions suggested by Ahn and Schmidt (1995) under the assumption of serially uncorrelated idiosyncratic errors. These additional nonlinear moment conditions can yield potentially sizeable efficiency gains and they also improve the finite-sample performance. Given that absence of serial correlation is usually a prerequisite also for other GMM estimators in the presence of a lagged dependent variable, the gains from the nonlinear moment conditions essentially come for free.

    The extra moment conditions can help to overcome a weak instruments problem of the Arellano and Bond (1991) difference-GMM estimator when the autoregressive coefficient approaches unity. Furthermore, the Ahn and Schmidt (1995) estimator is also robust to deviations from mean stationarity, a situation that would invalidate the Blundell and Bond (1998) system-GMM approach.

    Without these nonlinear moment conditions, xtdpdgmm replicates the results obtained with the familiar commands xtabond, xtdpd, xtdpdsys, and xtabond2, as well as my other recent command xtseqreg. Collapsing of GMM-type instruments and different initial weighting matrices are supported. The key option of xtdpdgmm that adds the nonlinear moment conditions is called noserial. For example:
    Code:
    . webuse abdata
    
    . xtdpdgmm L(0/1).n w k, noserial gmmiv(L.n, collapse model(difference)) iv(w k, difference model(difference)) twostep vce(robust)
    
    Generalized method of moments estimation
    
    Step 1
    initial:       f(p) =  6.9508498
    alternative:   f(p) =   1.917675
    rescale:       f(p) =  .07590133
    Iteration 0:   f(p) =  .07590133  
    Iteration 1:   f(p) =    .003352  
    Iteration 2:   f(p) =  .00274414  
    Iteration 3:   f(p) =  .00274388  
    Iteration 4:   f(p) =  .00274388  
    
    Step 2
    Iteration 0:   f(p) =  .26774896  
    Iteration 1:   f(p) =  .20397319  
    Iteration 2:   f(p) =   .2011295  
    Iteration 3:   f(p) =  .20109259  
    Iteration 4:   f(p) =  .20109124  
    Iteration 5:   f(p) =   .2010912  
    
    Group variable: id                           Number of obs         =       891
    Time variable: year                          Number of groups      =       140
    
    Moment conditions:     linear =      10      Obs per group:    min =         6
                        nonlinear =       6                        avg =  6.364286
                            total =      16                        max =         8
    
                                         (Std. Err. adjusted for clustering on id)
    ------------------------------------------------------------------------------
                 |              WC-Robust
               n |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               n |
             L1. |    .657292   .1381388     4.76   0.000     .3865449    .9280391
                 |
               w |  -.7248798   .0996565    -7.27   0.000    -.9202029   -.5295568
               k |   .2399022   .0737048     3.25   0.001     .0954435    .3843609
           _cons |   2.719216   .4015915     6.77   0.000     1.932111    3.506321
    ------------------------------------------------------------------------------
    The Gauss-Newton technique is used to minimize the GMM criterion function. With vce(robust), the Windmeijer (2005) finite-sample standard error correction is computed for estimators with and without nonlinear moment conditions.

    For details about the syntax, the available options, and the supported postestimation commands, please see the help files:
    Code:
    . help xtdpdgmm
    . help xtdpdgmm postestimation
    Available postestimation command include the Arellano-Bond test for absence of serial correlation in the first-differenced errors, estat serial, and the familiar Hansen J-test of the overidentifying restrictions, estat overid. The results of the Arellano-Bond test differ slightly from xtdpd and xtabond2 for two-step robust estimators because I account for the finite-sample Windmeijer (2005) correction when computing the test statistic, while the existing commands do not. estat overid can also be used to perform difference-in-Hansen tests but it requires that the two models are estimated separately. In that regard, the results differ from the difference-in-Hansen test statistics reported by xtabond2; see footnote 24 in Roodman (2009) for an explanation. An alternative to difference-in-Hansen tests is a generalized Hausman test, implemented in estat hausman for use after xtdpdgmm.

    Finally, the results with and without nonlinear moment conditions can in principle also be obtained with Stata's official gmm command. However, it is anything but straightforward to do so. While the official gmm command offers lots of extra flexibility, it does not provide a tailored solution for this particular estimation problem. While xtdpdgmm can easily handle unbalanced panel data, gmm tends to have some problems in that case. In addition, gmm tends to be very slow in particular with large data sets. I did not do a sophisticated benchmark comparison, but for a single estimation on a data set with 40,000 observations, it took me 43 minutes (!) to obtain the results with gmm, while xtdpdgmm returned the identical results after just 4 seconds!

    I hope you enjoy the new command. As always, comments and suggestions are highly welcome, and an appropriate reference would be very much appreciated if my command proves to be helpful for your own research.

    References:
    • Ahn, S. C., and P. Schmidt (1995). Efficient estimation of models for dynamic panel data. Journal of Econometrics 68: 5-27.
    • Arellano, M., and S. R. Bond (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies 58: 277-297.
    • Blundell, R., and S. R. Bond (1998). Initial conditions and moment restrictions in dynamic panel data models. Review of Economic Studies 87: 115-143.
    • Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal 9: 86-136.
    • Windmeijer, F. (2005). A finite sample correction for the variance of linear efficient two-step GMM estimators. Journal of Econometrics 126: 25-51.
    Last edited by Sebastian Kripfganz; 01 Jun 2017, 06:15.

  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz,

    Thank you for your always valuable insights!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    1) Model 3 was implemented in terms of the level equation with serially uncorrelated idiosyncratic errors. This is another benefit of orthogonalizing the instruments instead of transforming the model itself: You can use all the conventional procedures, including conventional weighting matrices.

    2) I have never really thought about an interpretation of the specific form of the orthogonalized instruments. There construction is merely mechanical; see slide 33 of my 2019 London Stata Conference presentation. I guess, interpretation (b) makes more sense.

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @ Sebastian Kripfganz,

    Thank you for your insights. I have two followup questions.

    1) Really cool the way around tricking the gmm command into accounting for first-order serial correlation in the first-difference equation. My question is, why was this not necessary in Model 3 on my original post above in #684?

    2) Also referring to Model 3 in #684 above, is the reason that when orthogonalizing the instruments relative to the unit fixed-effects we force the first observation for each panel to be missing to a) avoid the dummy variable trap or b) because our estimator is really a first-difference model and we are taking into account that we lose the first observation?

    Thank you again!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Thank you for this well-designed replication example.

    I will start with question 3. The reason why your final results differ from the previous ones is that the unadjusted initial weighting matrix does not account for the first-order serial correlation in the first-differenced equation. You would need to use winitial(xt D) for this purpose. However, the command only allows this option in combination with xtinstruments(). You can trick the gmm command to deliver the desired results by supplying an xt-instrument full of zeros:
    Code:
    gen zeros = 0
    gmm (D.n - {xb: LD.n D.wage D.emp D.k D.yr1979 D.yr1980 D.yr1981 D.yr1982}), ///
        instruments(iyr* wage emp k yr1979 - yr1982, noconstant) ///
        xtinstruments(zeros, lags(0/0)) winitial(xt D) vce(cluster id) twostep
    Regarding your other questions:
    1.a. You could think about it this way, yes.
    1.b/c. What you have done in your manual construction of the instruments is correct.
    2. The rational behind this approach is that there is no need to estimate a "system" of equations, when you think about the system GMM estimator. Of course, the Arellano-Bond estimator only has one equation in first differences, but the approach taken by xtdpdgmm is that all transformations are special cases of the system approach, which can be recast as a conventional estimator for the equation in levels with appropriately orthogonalized instruments. This simplifies the command's architecture substantially and makes it straightforward to implement any other type of transformation (e.g., forward-orthogonal deviations). The sample size is larger because it refers to the level model, not the first-differenced one. It is true though that effectively still one observation is lost due to the orthogonalization. Consequently, it is a fair question whether this should be reflected in the reported number of observations.

    Leave a comment:


  • Arkangel Cordero
    replied
    Dear Professor @Sebastian Kripfganz

    I hope you are well.

    I have, what I hope is, a quick set of questions. I am playing with the following four models below (Models I through IV).

    Code:
    webuse abdata, clear
    
    /* Balancing the panels for simplicty*/
    keep if year>=1977&year<=1982
    by id: keep if _N==6
    
    /* Model I: xtdpdgmm NATIVE syntax for GMM-Style instruments */
    xtdpdgmm L(0/1).n wage emp k yr1979 - yr1982, model(difference) gmm(L.(n), lag(1 2) model(difference)) iv(wage emp k yr1979 - yr1982, model(difference))  twostep nocons vce(cluster id)
    estimate store m1_xtdpdgmm
    
    
    
    
    /* Model II: xtdpdgmm with GMM-Style instruments calculated "by hand" */
    * Generate GMM-Style instruments "by hand"
    foreach var of varlist n {
        di "`var'"
    forvalues lag = 2(1)3 {
     display `lag'
     
     capture drop il`lag'`var'
     
     gen il`lag'`var' = L`lag'.`var'
     
     replace il`lag'`var' = 0 if il`lag'`var' ==.
     
    
     foreach year of varlist yr1977- yr1982 {
                
                capture drop i`year'l`lag'`var'
                gen i`year'l`lag'`var' = `year' * il`lag'`var'
                replace i`year'l`lag'`var' = 0 if i`year'l`lag'`var'  == .
                *replace i`year'l`lag'`var' = . if year == 1977
    }
    }
    }
    
    findname, all(@==0)
    drop `r(varlist)'
    
    xtdpdgmm L(0/1).n wage emp k yr1979 - yr1982, model(difference) iv(iyr* wage emp k yr1979 - yr1982, model(difference))  twostep nocons vce(cluster id)
    estimate store m2_xtdpdgmm
    
    
    
    
    /* Model III: Stata's "gmm" command with the GMM-Style instruments calculated by hand ORTHOGONALIZED relative to the fixed-effects */
    * Orthogonalize instruments calculated "by hand" relative to the fixed-effects
    foreach var of varlist iy* wage emp k yr1979 - yr1982 {    
        capture drop  orth_`var'    
        gen orth_`var' = `var'
        bysort id (year): replace orth_`var' = 0 if _n == 2
        bysort id (year):  replace orth_`var'  = orth_`var' - F1.orth_`var' if _n != _N    
        bysort id (year): replace orth_`var'  = . if _n == 1    
    }
    
    gmm (eq1: n  - {n:  L.n wage emp k yr1979 yr1980 yr1981 yr1982}), ///
        instruments(orth_*, noconstant) ///
        winitial(unadjusted) vce(cluster id) twostep
    estimate store m1_gmm
    
    
    /* Model IV:  Stata's "gmm" command with the GMM-Style instruments calculated "by hand" but NOT orthogonalized with respect to the fixed-effects */
    gmm (D.n - {xb: LD.n D.wage D.emp D.k D.yr1979 D.yr1980 D.yr1981 D.yr1982}), ///
        instruments(iyr* wage emp k yr1979 - yr1982, noconstant) ///
        winitial(unadjusted) vce(cluster id) twostep
    estimate store m2_gmm
    
    esttab  m1_xtdpdgmm m2_xtdpdgmm m1_gmm m2_gmm ,  b(7) se(7) order(L.n LD.n wage D.wage emp D.emp k D.k)
    With results:

    HTML Code:
    ----------------------------------------------------------------------------
                          (1)             (2)             (3)             (4)  
                            n               n                                  
    ----------------------------------------------------------------------------
    main                                                                        
    L.n            -0.1244813      -0.1244813      -0.1244813                  
                  (0.3169631)     (0.3169631)     (0.2429797)                  
    
    LD.n                                                           -0.1492370  
                                                                  (0.2419743)  
    
    wage           -0.0294276      -0.0294276      -0.0294276                  
                  (0.0170682)     (0.0170682)     (0.0160661)                  
    
    D.wage                                                         -0.0299815  
                                                                  (0.0163513)  
    
    emp             0.0144419       0.0144419       0.0144419*                  
                  (0.0092611)     (0.0092611)     (0.0071777)                  
    
    D.emp                                                           0.0142584*  
                                                                  (0.0072503)  
    
    k               1.0777604**     1.0777604**     1.0777604***                
                  (0.3357940)     (0.3357940)     (0.2396408)                  
    
    D.k                                                             1.1037843***
                                                                  (0.2389169)  
    
    yr1979         -0.0256686*     -0.0256686*     -0.0256686*                  
                  (0.0121993)     (0.0121993)     (0.0123034)                  
    
    yr1980         -0.0271234      -0.0271234      -0.0271234                  
                  (0.0151974)     (0.0151974)     (0.0155296)                  
    
    yr1981         -0.0024327      -0.0024327      -0.0024327                  
                  (0.0345831)     (0.0345831)     (0.0299392)                  
    
    yr1982          0.0534354       0.0534354       0.0534354                  
                  (0.0579830)     (0.0579830)     (0.0523127)                  
    
    D.yr1979                                                       -0.0260374*  
                                                                  (0.0125335)  
    
    D.yr1980                                                       -0.0267815  
                                                                  (0.0158829)  
    
    D.yr1981                                                       -0.0002548  
                                                                  (0.0303609)  
    
    D.yr1982                                                        0.0568577  
                                                                  (0.0532091)  
    ----------------------------------------------------------------------------
    N                     690             690             690             552  
    ----------------------------------------------------------------------------  
    Standard errors in parentheses
    * p<0.05, ** p<0.01, *** p<0.001

    I have three sets of questions that I was hoping you could provide some guidance with. Please not that all pertain to the “classic” first-difference” gmm models a la Arellano & Bond (1991).

    1) My first question is about the way that xtdpdgmm orthogonalizes the instruments with respect to the unit-level fixed-effects. Previously you mentioned that the key is that the sum of the instruments within panel be equal to “0”.

    a. My way of understanding the above statement is that if the within-unit (i.e., panel) sum of an instrument is equal to “0”, then its mean will also be equal to “0”. Under such circumstances, the instrument become deviations from its within unit means (which is “0”), and therefore orthogonal to the unit fixed-effects. Is this interpretation accurate?
    b. When orthogonalizing each instrument with respect to the unit-fixed effects, it seems that xtdpdgmm simply subtracts from each value the value at the next time period within each unit(panel). Is that correct?
    c. If so, it appears that for each instrument:
    *The value of an instrument for the last time-period within each unit is left intact because we don’t have anything to subtract from it. Is that correct?
    *The value of an instrument for the first time-period within each unit is set to missing to avoid the “dummy-variable trap”. Is that accurate?
    *The value of an instrument for the second time-period within each unit is set to “0” before subtracting the value for the subsequent time period. This is done because we have set the first period to missing and, hence, to ensure that the within unit sum is “0”. Is that correct?
    2) Can you please provide some insight as to why xtdpdgmm chooses to orthogonalize the instruments rather than taking first-differences in the equation to be estimated? I noticed that the sample size is larger because of this. Is that part of the reason? I’m just curious.

    3) Finally, as you can see, models I through III above, all estimate the same coefficients. Do you have any insights as to why the “un-orthogonalized” instruments produce different coefficient estimates in the last model (Model IV) when they produce the correct estimates with xtdpdgmm in Model II?

    Thank you in advance for any insights.
    Last edited by Arkangel Cordero; 03 Apr 2025, 19:07.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    An important update to version 2.6.9 is now available for xtdpdgmm from my personal website, which fixes the bug just mentioned in the previous post, where some of the Difference-in-Hansen test statistics obtained after running xtdpdgmm with option overid had been incorrect. (The source of the bug was an incorrect selection of the relevant moments when combining all instruments for the level model or a transformed model.)
    Code:
    net install xtdpdgmm, from(http://www.kripfganz.de/stata/) replace

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The first column, labelled "Excluding", provides an overidentification test for a model without the instruments from the respective row. If this test rejects the null hypothesis, then this indicates that even without those instruments the model might be misspecified. In this case, adding the respective instruments would not help, even if the additional ones were all valid, because there are still some invalid instruments. Thus, testing those additional instruments would not be feasible. This is because the column labelled "Difference" effectively compares the Hansen test for the full model with all instruments to the initial model from the "Excluding" column. But if this initial model is misspecified, then the "Difference" test would compare the full model to a misspecified model. Consequently, if both models are similarly misspecified, it the "Difference" test might not reject. But this could be misleading. Therefore, looking at the "Difference" test really only makes sense when the "Excluding" test is successfully passed.

    The last two rows in your table are basically a combination of rows 1-2 and rows 3-4, respectively. For example, in the last row the "Excluding" test jointly excludes all the instruments for the level model from rows 3 and 4, and the "Difference" test compares the full model to this model with those excluded instruments.

    While I understand why the degrees of freedoms in the last two rows are identical - this is because there is an equal number of instruments for the level model and the transformed model - I am bit puzzled about the numerically identical values of the test statistic. This looks a bit odd and appears to be a bug!

    Leave a comment:


  • Matej Korinek
    replied
    Dear professor Kripfganz, Sebastian Kripfganz

    I am just wondering about the following. I went through your 2019 London Stata Conference presentation very carefully. I am not sure how to exactly interpret the Incremental overidentification test. In particular, I have the following output:

    Sargan-Hansen (difference) test of the overidentifying restrictions
    H0: (additional) overidentifying restrictions are valid

    2-step weighting matrix from full model

    | Excluding | Difference
    Moment conditions | chi2 df p | chi2 df p
    ------------------+-----------------------------+-----------------------------
    1, model(fodev) | 249.0901 235 0.2521 | 5.7804 13 0.9538
    2, model(fodev) | 171.4524 131 0.0102 | 83.4181 117 0.9919
    3, model(level) | 244.7790 235 0.3172 | 10.0916 13 0.6864
    4, model(level) | 170.3794 131 0.0118 | 84.4912 117 0.9897
    model(fodev) | 151.3709 118 0.0208 | 103.4997 130 0.9581
    model(level) | 151.3709 118 0.0208 | 103.4997 130 0.9581

    Sargan-Hansen test of the overidentifying restrictions
    H0: overidentifying restrictions are valid

    2-step moment functions, 2-step weighting matrix chi2(248) = 254.8706
    Prob > chi2 = 0.3686

    2-step moment functions, 3-step weighting matrix chi2(248) = 265.6096
    Prob > chi2 = 0.2111

    For example, the first row tells me the Hansen J statistic from the whole moment matrix excluding 13 moment conditions (I have 13 collapsed lags of the dependent variable) and then the same statistics just for those 13 conditions.. In particular I do not understand those last two rows. What do they mean? How to interpret them? In particular, I know that the difference column should tell you the Hansen J statistics of just the level equation moments (or the fodev moments based on the particular row) but what model does those 118 moment conditions in the excluding column represent? I am just trying to understand that particular reduced model since it never passes in my models. That leads me to second question. I always nicely pass the difference criteria with p-value above 0.8 always. But frequently, I do not pass the reduced model in the excluding column. Is that a problem? How is that possible?

    Thank you for your time

    Matěj Kořínek

    Leave a comment:


  • Sebastian Kripfganz
    replied
    There is nothing wrong with these instruments per se. It is just a bit unusual that you are using different lag orders to instrument the lagged dependent variables and the independent variable. You should make sure that this can be justified; otherwise it looks like cherry picking a model specification that delivers the nicest results.

    For the controls, it is also unusual to not specify instruments for the transformed model; e.g., iv(l.(controls), m(fodev)).

    As a technical comment, note that gmm(l(1/2).DV, lag(0 0) collapse m(fodev)) is equivalent to iv(l(1/2).DV, m(fodev)).

    Leave a comment:


  • Nicu Sprincean
    replied
    Sebastian Kripfganz

    Hi, Sebastian,

    I have a question regarding a model specification where a I include the second lag of the dependent variable to deal with serial correlation. The model takes the following form:
    Code:
    xtdpdgmm DV l(1/2).DV l.IV l.(controls),  gmm(l(1/2).DV, lag(0 0) collapse m(fodev)) gmm(l.IV, lag(0 3) collapse m(fodev)) iv(l.(controls),d m(level)) teffects two vce(robust) nocons
    I assume that l.DV and l.IV are both predetermined - all right-hand side variables enter the model with a one-year lag, due to economic reasons, and all controls to be strictly exogeneous. I am not sure whether
    Code:
    gmm(l(1/2).DV, lag(0 0) collapse m(fodev))
    and
    Code:
    iv(l.(controls),d m(level))
    are correctly specified.

    Thank you in advance for your response!
    Last edited by Nicu Sprincean; 07 Feb 2025, 06:51.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Any standard econometrics textbook should cover systems of simultaneous equations and regressor endogeneity.

    A variable is predetermined if it is a function of (i.e., determined by) previous periods' shocks to the equation of interest (but not current or future periods' shocks).

    Leave a comment:


  • Nursena Sagir
    replied
    Dear Sebastian,

    Is there any reference that I can understand this better and explain why I can treat X_it as predetermined? I had difficulties to explain it in the method section of my paper. Or can you elaborate more on your reasoning?

    Best regards,
    Nursena

    Leave a comment:


  • Sebastian Kripfganz
    replied
    As long as you do not change the second equation, my earlier statement about Xit being predetermined still stands.

    Once Xit becomes a direct function of Yit, as in your amended second equation, Xit becomes endogenous.

    Leave a comment:


  • Nursena Sagir
    replied
    Dear Sebastian,

    I have a follow-up question.
    Originally posted by Sebastian Kripfganz View Post
    I think there are at least two things that are potentially confusing here.


    In your two-equations example. Xit is predetermined because it is a function of Yit-1. But Xit-1 is uncorrelated with the error term in εit (and any future error term) in equation 1, which is all that is needed for it to be a valid instrument (in the FOD-transformed model).
    You have stated that for the two-equations example below:
    1. Yit = Yit-1 + Yit-2 + Xit-1 + Xit-2 + εit
    2. Xit = Xit-1 + Xit-2 + Yit-1 + Yit-2 + εit
    If I change the first equation to:

    1. Yit = Yit-1 + Yit-2 + Xit + Xit-1 + εit

    which includes contemporaneous X, can I still assume that Xit is predetermined? When I look at the incJ test, it does not reject the null hypothesis that the additional overidentifying restriction for predetermined X is valid (p-value =0.95).

    Would your response change if I have second equation as:

    2. Xit = Xit-1 + Xit-2 + Yit + Yit-1 + εit

    Thank you in advance!

    Best regards,
    Nursena

    Leave a comment:

Working...
X