Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xtabond2 model specification

    The AR stats keeps being significant until I raised the lagged dependent variable to the 4th lagged term. So I ran the model below

    Code:
    xtabond2 proactivity l(1/4)(proactivity) generalcrime feb mar apr may jun jul aug sep oct nov dec, gmm( proactivity generalcrime, lag(5 8)) iv( feb mar apr may jun jul aug sep oct nov dec, equation(level)) twostep robust artests(5)
    I tried l.proactivity, l(1/2)(proactivity),l(1/3)(proactivity), this is the only one that has insignificant AR stats beyond AR(1). The model output looks okay below. But it appears very sensitive to model specification. If I add collapse, which isn't supposed to change the results, results change. Any thoughts on why it changes, or other ways to specify the model?

    Code:
    Warning: Two-step estimated covariance matrix of moments is singular.
      Using a generalized inverse to calculate optimal weighting matrix for two-step est
    > imation.
      Difference-in-Sargan/Hansen statistics may be negative.
    
    Dynamic panel-data estimation, two-step system GMM
    ------------------------------------------------------------------------------
    Group variable: id                              Number of obs      =     25104
    Time variable : week                            Number of groups   =       523
    Number of instruments = 470                     Obs per group: min =        48
    Wald chi2(16) =  14145.86                                      avg =     48.00
    Prob > chi2   =     0.000                                      max =        48
    ------------------------------------------------------------------------------
                 |              Corrected
     proactivity |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
     proactivity |
             L1. |   .4472726   .0460867     9.71   0.000     .3569444    .5376008
             L2. |   .2588113   .0712742     3.63   0.000     .1191166    .3985061
             L3. |   .1298111   .0540169     2.40   0.016     .0239399    .2356823
             L4. |   .1308565   .0247562     5.29   0.000     .0823352    .1793778
                 |
    generalcrime |   .0577832   .0373932     1.55   0.122    -.0155061    .1310726
             feb |  -.5542773   .1725027    -3.21   0.001    -.8923764   -.2161782
             mar |  -.7677247   .1638914    -4.68   0.000    -1.088946   -.4465035
             apr |  -.6148198   .1516915    -4.05   0.000    -.9121296     -.31751
             may |  -.6942948   .1564604    -4.44   0.000    -1.000952   -.3876381
             jun |  -.7967815    .156679    -5.09   0.000    -1.103867   -.4896964
             jul |   -.696932   .1484477    -4.69   0.000    -.9878841   -.4059799
             aug |  -.8511846   .1570522    -5.42   0.000    -1.159001    -.543368
             sep |  -.6681694   .1469222    -4.55   0.000    -.9561316   -.3802071
             oct |  -.6614347   .1601904    -4.13   0.000    -.9754021   -.3474672
             nov |  -.8454832   .1686919    -5.01   0.000    -1.176113   -.5148532
             dec |   -.536543   .1588977    -3.38   0.001    -.8479767   -.2251092
           _cons |   .6442327   .2119702     3.04   0.002     .2287787    1.059687
    ------------------------------------------------------------------------------
    Instruments for first differences equation
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(5/8).(proactivity generalcrime)
    Instruments for levels equation
      Standard
        feb mar apr may jun jul aug sep oct nov dec
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        DL4.(proactivity generalcrime)
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -6.31  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   0.71  Pr > z =  0.477
    Arellano-Bond test for AR(3) in first differences: z =  -0.81  Pr > z =  0.416
    Arellano-Bond test for AR(4) in first differences: z =   0.96  Pr > z =  0.335
    Arellano-Bond test for AR(5) in first differences: z =  -0.97  Pr > z =  0.334
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(453)  =1776.52  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(453)  = 473.71  Prob > chi2 =  0.242
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(359)  = 398.55  Prob > chi2 =  0.074
        Difference (null H = exogenous): chi2(94)   =  75.16  Prob > chi2 =  0.923
      iv(feb mar apr may jun jul aug sep oct nov dec, eq(level))
        Hansen test excluding group:     chi2(442)  = 463.10  Prob > chi2 =  0.235
        Difference (null H = exogenous): chi2(11)   =  10.62  Prob > chi2 =  0.476
    Code:
    . xtabond2 proactivity l(1/4)(proactivity) generalcrime feb mar apr may jun jul aug
    > sep oct nov dec, gmm( proactivity generalcrime, lag(5 8) collapse) iv( feb mar apr
    >  may jun jul aug sep oct nov dec, equation(level)) twostep robust artests(5)
    Favoring space over speed. To switch, type or click on mata: mata set matafavor spee
    > d, perm.
    
    Dynamic panel-data estimation, two-step system GMM
    ------------------------------------------------------------------------------
    Group variable: id                              Number of obs      =     25104
    Time variable : week                            Number of groups   =       523
    Number of instruments = 22                      Obs per group: min =        48
    Wald chi2(16) =    133.97                                      avg =     48.00
    Prob > chi2   =     0.000                                      max =        48
    ------------------------------------------------------------------------------
                 |              Corrected
     proactivity |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
     proactivity |
             L1. |   .0661376   .4625995     0.14   0.886    -.8405409     .972816
             L2. |   1.067306   .2863359     3.73   0.000     .5060984    1.628514
             L3. |   .3243474   .4095168     0.79   0.428    -.4782908    1.126986
             L4. |  -.0056103   .0693738    -0.08   0.936    -.1415804    .1303599
                 |
    generalcrime |  -.2164601   .2403091    -0.90   0.368    -.6874574    .2545372
             feb |  -.1236117   .9449195    -0.13   0.896     -1.97562    1.728396
             mar |  -.5567039   .8190546    -0.68   0.497    -2.162021    1.048613
             apr |  -.3190208   .8371222    -0.38   0.703     -1.95975    1.321709
             may |  -.2762736   .7620428    -0.36   0.717     -1.76985    1.217303
             jun |    -.41769   .7011003    -0.60   0.551    -1.791821    .9564412
             jul |   -.212813   .7252157    -0.29   0.769     -1.63421    1.208584
             aug |   -.437525   .7145739    -0.61   0.540    -1.838064    .9630141
             sep |   -.309892   .8271875    -0.37   0.708     -1.93115    1.311366
             oct |  -.3654707   .8021538    -0.46   0.649    -1.937663    1.206722
             nov |  -.7637196   .7350004    -1.04   0.299    -2.204294    .6768547
             dec |  -.2051388   .8173252    -0.25   0.802    -1.807067    1.396789
           _cons |  -.8427515   1.483277    -0.57   0.570     -3.74992    2.064417
    ------------------------------------------------------------------------------
    Instruments for first differences equation
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(5/8).(proactivity generalcrime) collapsed
    Instruments for levels equation
      Standard
        feb mar apr may jun jul aug sep oct nov dec
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        DL4.(proactivity generalcrime) collapsed
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -0.25  Pr > z =  0.800
    Arellano-Bond test for AR(2) in first differences: z =  -2.32  Pr > z =  0.020
    Arellano-Bond test for AR(3) in first differences: z =   0.22  Pr > z =  0.823
    Arellano-Bond test for AR(4) in first differences: z =   1.03  Pr > z =  0.305
    Arellano-Bond test for AR(5) in first differences: z =   1.22  Pr > z =  0.224
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(5)    =  29.50  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(5)    =  10.78  Prob > chi2 =  0.056
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(3)    =   4.08  Prob > chi2 =  0.253
        Difference (null H = exogenous): chi2(2)    =   6.70  Prob > chi2 =  0.035

  • #2
    Can you please help me? @Sebastian Kripfganz

    Comment


    • #3
      These GMM estimators for dynamic panel data models are designed for situations with a small time dimension. In your case, T=48 is already quite large. As a consequence, you are using a huge number of instruments (470) that can lead to a "too-many-instruments problem". Collapsing would be one way to address this problem. Starting only with the 5th lag as an instrument is problematic because these deep lags might be weak instruments. When there is no remaining serial error correlation, the second lag already qualifies as a valid instrument.

      To deal with the autocorrelation, besides adding lags of the dependent variable you can also add lags of the independent variable(s). In your case, the model is probably also suffering from omitted variables.

      In any case, given the large sample you have, I would recommend to resort to different estimation strategies. The dynamic panel data bias should be reasonably small given your time series dimension such that you could even use the classical fixed-effects estimator with xtreg, or xtivreg if you still want to instrument the independent variable by its own lags.
      https://www.kripfganz.de/stata/

      Comment


      • #4
        Originally posted by Sebastian Kripfganz View Post
        These GMM estimators for dynamic panel data models are designed for situations with a small time dimension. In your case, T=48 is already quite large. As a consequence, you are using a huge number of instruments (470) that can lead to a "too-many-instruments problem". Collapsing would be one way to address this problem. Starting only with the 5th lag as an instrument is problematic because these deep lags might be weak instruments. When there is no remaining serial error correlation, the second lag already qualifies as a valid instrument.

        To deal with the autocorrelation, besides adding lags of the dependent variable you can also add lags of the independent variable(s). In your case, the model is probably also suffering from omitted variables.

        In any case, given the large sample you have, I would recommend to resort to different estimation strategies. The dynamic panel data bias should be reasonably small given your time series dimension such that you could even use the classical fixed-effects estimator with xtreg, or xtivreg if you still want to instrument the independent variable by its own lags.
        Thanks @Sebastian Kripfganz!

        I apologize for the late response as I am getting used to using the site.

        One follow up question though, from the Hansen's test results, it does not appear that "too many instruments" caused an issue here since they are above 0.1 and below 0.25? I tried adding "collapse", but the lagged terms of the DV became insignificant. You mentioned in other posts that this might indicate a collinearity issue among the lags. But I also couldn't remove them since the AR tests are significant? I know it doesn't show in the above results, but if I remove any lag of the DV from the model, the AR test becomes significant at least for one of the AR term.

        I also tried adding lags of the independent variable to remove the significant autocorrelation, but it does not rid the autocorrelation. In order to remove the autocorrelation, I had to include at least 4 lags of the DV, and subsequently use lag terms starting from the 5th to instrument. Is there a way to test whether the instruments are weak?

        Again, thanks for your help!

        Comment


        • #5
          Originally posted by Hannah Wu View Post
          I know it doesn't show in the above results, but if I remove any lag of the DV from the model, the AR test becomes significant at least for one of the AR term.

          I also tried adding lags of the independent variable to remove the significant autocorrelation, but it does not rid the autocorrelation. In order to remove the autocorrelation, I had to include at least 4 lags of the DV, and subsequently use lag terms starting from the 5th to instrument. Is there a way to test whether the instruments are weak?
          If adding lags of independent variables does not help, then you might just keep the extra lags of the dependent variable even if they are insignificant. This should not bite much given your large time series. If there is no serial correlation anymore, you can then still use instruments from the 2nd lag onwards.

          A simple way of checking for weak instruments would be to look at unconditional correlations between the regressors and the corresponding instruments with the correlate command. The community-contributed ivreg2 command provides further weak-instrument statistics; see slides 39 to 42 of my 2019 London Stata conference presentation on how to use the xtdpdgmm command to replicate dynamic panel data GMM results with ivreg2.
          https://www.kripfganz.de/stata/

          Comment


          • #6
            Originally posted by Sebastian Kripfganz View Post

            If adding lags of independent variables does not help, then you might just keep the extra lags of the dependent variable even if they are insignificant. This should not bite much given your large time series. If there is no serial correlation anymore, you can then still use instruments from the 2nd lag onwards.

            A simple way of checking for weak instruments would be to look at unconditional correlations between the regressors and the corresponding instruments with the correlate command. The community-contributed ivreg2 command provides further weak-instrument statistics; see slides 39 to 42 of my 2019 London Stata conference presentation on how to use the xtdpdgmm command to replicate dynamic panel data GMM results with ivreg2.
            Thank you so much Sebastian Kripfganz! This is so helpful.

            For the IV- say, if I also added two lags of the IV, can I still use the second lag onwards to instrument both the IV and DV? so something like gmm(iv dv, lag(2 .))?

            The other question is regarding the interpretation of the lagged IV. My time unit is week here. If IV(t-1) is significant, can I interpret it as the impact of IV from last week on current DV? How does using the 2nd onwards as instruments affect the interpretation?

            I really appreciate your help!

            Hannah

            Comment


            • #7
              Originally posted by Hannah Wu View Post
              For the IV- say, if I also added two lags of the IV, can I still use the second lag onwards to instrument both the IV and DV? so something like gmm(iv dv, lag(2 .))?
              Yes.

              Originally posted by Hannah Wu View Post
              The other question is regarding the interpretation of the lagged IV. My time unit is week here. If IV(t-1) is significant, can I interpret it as the impact of IV from last week on current DV? How does using the 2nd onwards as instruments affect the interpretation?
              All the effects are partial effects, i.e. holding everything else constant. The coefficient of IV(t-1) would be the effect of a one-unit change in past week's IV on current DV assuming that everything else in the model remains unchanged. In dynamic models, this interpretation may not be very meaningful because IV(t-1) also has an effect on DV(t-1) and DV(t-1) in turn has an effect on current DV. What you maybe want to compute are so-called long-run effects (the sum of the coefficients of all current and lagged IV divided by 1 minus the sum of the coefficients of all lags of DV).
              https://www.kripfganz.de/stata/

              Comment


              • #8
                Originally posted by Sebastian Kripfganz View Post
                Yes.


                All the effects are partial effects, i.e. holding everything else constant. The coefficient of IV(t-1) would be the effect of a one-unit change in past week's IV on current DV assuming that everything else in the model remains unchanged. In dynamic models, this interpretation may not be very meaningful because IV(t-1) also has an effect on DV(t-1) and DV(t-1) in turn has an effect on current DV. What you maybe want to compute are so-called long-run effects (the sum of the coefficients of all current and lagged IV divided by 1 minus the sum of the coefficients of all lags of DV).
                I see. Thank you Sebastian Kripfganz!

                One more question regarding the coefficient of IV(t). When none of the lagged IV is significant, does a significant IV(t) indicate an immediate yet short-term effect or an immediate and permanent effect? I got myself confused.. Based on the long-run effect calculation, IV(t) should indicate a permanent change (even though it's a contemporaneous term) if none of the lags is significant; and if the lags are significant, that means it takes longer to reach a long-run effect?

                Comment


                • #9
                  The coefficient of IV(t) indicates an immediate short-term effect. The long-run effect in response to a permanent change of IV will be a function of this short-run effect (the accumulation of short-run effects over time).
                  https://www.kripfganz.de/stata/

                  Comment

                  Working...
                  X