  • System GMM with xtabond2 and xtdpdgmm

    Hi Statalisters,

    I would like some input on my choices between xtabond2 and xtdpdgmm, as they differ and as I am unsure if I am on the right path. Excuse the confusing variable names they may not be final.

    The specification is to the best of my knowledge in line with the literature. Specifically, I follow : Lags 2 and lags 3 of the levels of firm performance variable rdint_log, governance variables pressure_sens pressure_res inter1 inter2 owner_share and control variables ln_age log_employees. are employed as GMM-type instruments for the first differences equation. Lags 1 of the first differences of firm performance, corporate governance, and control variables are used as GMM-type instruments for the levels equation. gii_score is a country level variable which I also may want to interact with the governance variables.

    inter1 and inter2 are owner_share * pressure_res owner_share * pressure_sens interactions respectively. I would highly appreciate some guidance, in terms of pointing out anything clearly off in my code or some other concern - as I am confused about the estimations, or rather the postestimation tests which I think fail in my model.

    xtabond2 rdint_log L.rdint_log pressure_sens pressure_res inter1 inter2 owner_share ln_age log_employees i.year gii_score, gmm(rdint_log pressure_sens pressure_res owner_share inter1 inter2 log_employees, lag(2 3) collapse equation (diff)) gmm (pressure_sens pressure_res owner_share rdint_log, lag(1 1) collapse equation(level)) iv(i.year ln_age gii_score, equation(level)) two ro
    Dynamic panel-data estimation, two-step system GMM
    Group variable: id                              Number of obs      =      3836
    Time variable : year                            Number of groups   =       664
    Number of instruments = 27                      Obs per group: min =         1
    Wald chi2(17) =   4099.12                                      avg =      5.78
    Prob > chi2   =     0.000                                      max =         7
                  |              Corrected
        rdint_log |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        rdint_log |
              L1. |  -.0231576   .0348396    -0.66   0.506     -.091442    .0451267
    pressure_sens |  -.6548435   .4478926    -1.46   0.144    -1.532697    .2230098
     pressure_res |  -2.142851   .8957042    -2.39   0.017    -3.898399    -.387303
           inter1 |   8.663443   3.873895     2.24   0.025     1.070747    16.25614
           inter2 |   .0250038   1.146118     0.02   0.983    -2.221346    2.271354
      owner_share |   .0541628   .7377693     0.07   0.941    -1.391838    1.500164
           ln_age |  -.0232487    .116177    -0.20   0.841    -.2509514    .2044539
    log_employees |  -.2985976   .0797236    -3.75   0.000    -.4548531   -.1423422
             year |
            2010  |          0  (empty)
            2011  |  -3.030497   .8781278    -3.45   0.001    -4.751596   -1.309398
            2012  |  -3.181193   .9038798    -3.52   0.000    -4.952764   -1.409621
            2013  |  -3.164345    .899639    -3.52   0.000    -4.927605   -1.401085
            2014  |  -.1484917   .9110414    -0.16   0.871      -1.9341    1.637117
            2015  |  -3.003705   .8902785    -3.37   0.001    -4.748619   -1.258791
            2016  |   -3.13521   .9119268    -3.44   0.001    -4.922553   -1.347866
            2017  |  -3.111658   .9127353    -3.41   0.001    -4.900586   -1.322729
        gii_score |   .0351893   .0123263     2.85   0.004     .0110303    .0593484
            _cons |          0  (omitted)
    Instruments for first differences equation
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(2/3).(rdint_log pressure_sens pressure_res owner_share inter1 inter2
        log_employees) collapsed
    Instruments for levels equation
        2010b.year 2011.year 2012.year 2013.year 2014.year 2015.year 2016.year
        2017.year ln_age gii_score
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        DL.(pressure_sens pressure_res owner_share rdint_log) collapsed
    Arellano-Bond test for AR(1) in first differences: z = -12.60  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   5.25  Pr > z =  0.000
    Sargan test of overid. restrictions: chi2(9)    = 104.00  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(9)    =  66.44  Prob > chi2 =  0.000
      (Robust, but weakened by many instruments.)
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(5)    =  38.48  Prob > chi2 =  0.000
        Difference (null H = exogenous): chi2(4)    =  27.96  Prob > chi2 =  0.000
      gmm(pressure_sens pressure_res owner_share rdint_log, collapse eq(level) lag(1 1))
        Hansen test excluding group:     chi2(5)    =  38.48  Prob > chi2 =  0.000
        Difference (null H = exogenous): chi2(4)    =  27.96  Prob > chi2 =  0.000
      iv(2010b.year 2011.year 2012.year 2013.year 2014.year 2015.year 2016.year 2017.year ln_age gii_score, eq
    > (level))
        Hansen test excluding group:     chi2(1)    =  15.37  Prob > chi2 =  0.000
        Difference (null H = exogenous): chi2(8)    =  51.07  Prob > chi2 =  0.000
    Since I face the issue of omitted time variables, and also the constant, I resorted to xtdpdgmm as suggested by Dr Kripfganz here on Statalist.

    I tried to replicate - altough I am not sure if this is entirely equal to a system gmm, the results are not matching :

    xtdpdgmm L.rdint_log rdint_log pressure_sens pressure_res inter1 inter2 owner_share ln_age log_employees gii_score, noserial gmmiv(L.rdint_log pressure_sens pressure_res owner_share inter1 inter2 log_employees, collapse model (difference)) iv(gii_score ln_age, difference model(difference)) twostep vce(robust) teffects overid
    Group variable: id                           Number of obs         =      3836
    Time variable: year                          Number of groups      =       664
    Moment conditions:     linear =      57      Obs per group:    min =         1
                        nonlinear =       5                        avg =  5.777108
                            total =      62                        max =         7
                                        (Std. Err. adjusted for 664 clusters in id)
                  |              WC-Robust
      L.rdint_log |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        rdint_log |  -.2300437   .2514488    -0.91   0.360    -.7228742    .2627868
    pressure_sens |   .4139911   .4576866     0.90   0.366    -.4830582     1.31104
     pressure_res |   1.787731   .8391795     2.13   0.033     .1429689    3.432492
           inter1 |  -9.637961   3.956848    -2.44   0.015    -17.39324   -1.882682
           inter2 |  -.4744583   1.130968    -0.42   0.675    -2.691115    1.742198
      owner_share |   .8176477   .7949369     1.03   0.304    -.7404001    2.375695
           ln_age |   .1277035   .2034326     0.63   0.530     -.271017     .526424
    log_employees |   .1106846   .1047606     1.06   0.291    -.0946425    .3160116
        gii_score |  -.0275028   .0247245    -1.11   0.266    -.0759619    .0209564
             year |
            2012  |  -.4822724   .1729211    -2.79   0.005    -.8211916   -.1433533
            2013  |  -.5913051   .1852836    -3.19   0.001    -.9544543   -.2281558
            2014  |   .3287016   .6799291     0.48   0.629    -1.003935    1.661338
            2015  |   2.087275   .3400099     6.14   0.000     1.420868    2.753682
            2016  |  -.3082466   .1692142    -1.82   0.069    -.6399003    .0234071
            2017  |  -.0683088   .1565524    -0.44   0.663    -.3751459    .2385283
            _cons |  -4.436074   1.619953    -2.74   0.006    -7.611124   -1.261024
    Instruments corresponding to the linear moment conditions:
     1, model(diff):
       L1.L.rdint_log L2.L.rdint_log L3.L.rdint_log L4.L.rdint_log L5.L.rdint_log
       L6.L.rdint_log L1.pressure_sens L2.pressure_sens L3.pressure_sens
       L4.pressure_sens L5.pressure_sens L6.pressure_sens L7.pressure_sens
       L1.pressure_res L2.pressure_res L3.pressure_res L4.pressure_res
       L5.pressure_res L6.pressure_res L7.pressure_res L1.owner_share
       L2.owner_share L3.owner_share L4.owner_share L5.owner_share L6.owner_share
       L7.owner_share L1.inter1 L2.inter1 L3.inter1 L4.inter1 L5.inter1 L6.inter1
       L7.inter1 L1.inter2 L2.inter2 L3.inter2 L4.inter2 L5.inter2 L6.inter2
       L7.inter2 L1.log_employees L2.log_employees L3.log_employees
       L4.log_employees L5.log_employees L6.log_employees L7.log_employees
     2, model(diff):
       D.gii_score D.ln_age
     3, model(level):
       2012bn.year 2013.year 2014.year 2015.year 2016.year 2017.year
     4, model(level):
    Running this code, I notice the missing of tests for serial correlation and Hansen tests, compared to xtabond 2. For example, when I try estat serial, I get the following no matter how I sort the data: .

    estat serial
    not sorted

  • #2
    The problem is that you are using a time-series operator for your dependent variable: L.rdint_log.

    Are you sure that you want to regress the lag of rdint_log on rdint_log itself instead of the other way round?

    If you really want to specify the dependent variable with a time-series operator, a workaround would be to generate the lag first and then use this new variable as the dependent variable:
    gen L_rdint_log = L.rdint_log


    • #3
      Originally posted by Sebastian Kripfganz View Post
      The problem is that you are using a time-series operator for your dependent variable: L.rdint_log.

      Are you sure that you want to regress the lag of rdint_log on rdint_log itself instead of the other way round?

      If you really want to specify the dependent variable with a time-series operator, a workaround would be to generate the lag first and then use this new variable as the dependent variable:
      gen L_rdint_log = L.rdint_log
      Thank you, this solved that particular issue. Now, rerunning the code, what would make sense in terms of responding to the seemingly invalid instruments?

      xtdpdgmm rdint_log L2.rdint_log pressure_sens pressure_res inter1 inter2 owner_share ln_age log_employees gii_score, noserial gmmiv(L2.rdint_log pressure_sens pressure_res owner_share inter1 inter2 log_employees, collapse model (difference)) iv(gii_score ln_age, model(difference)) twostep vce(robust) teffects
      Group variable: id                           Number of obs         =      3301
      Time variable: year                          Number of groups      =       659
      Moment conditions:     linear =      55      Obs per group:    min =         1
                          nonlinear =       4                        avg =  5.009105
                              total =      59                        max =         6
                                          (Std. Err. adjusted for 659 clusters in id)
                    |              WC-Robust
          rdint_log |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          rdint_log |
                L2. |   .0706734   .0292897     2.41   0.016     .0132666    .1280802
      pressure_sens |  -.5944906    .288983    -2.06   0.040    -1.160887   -.0280944
       pressure_res |  -.3991786   .5088306    -0.78   0.433    -1.396468     .598111
             inter1 |   1.415368    2.05336     0.69   0.491    -2.609144    5.439881
             inter2 |   .3167768   .7049669     0.45   0.653    -1.064933    1.698487
        owner_share |   .1665736   .4630157     0.36   0.719    -.7409205    1.074068
             ln_age |  -.1242958   .1352198    -0.92   0.358    -.3893217      .14073
      log_employees |  -.2278335   .0445314    -5.12   0.000    -.3151135   -.1405535
          gii_score |   .0142288   .0164633     0.86   0.387    -.0180386    .0464962
               year |
              2013  |   .0062306   .0694794     0.09   0.929    -.1299465    .1424077
              2014  |   3.005375   .1500664    20.03   0.000      2.71125    3.299499
              2015  |   .0593812   .0772396     0.77   0.442    -.0920057    .2107681
              2016  |  -.1514663   .1078699    -1.40   0.160    -.3628874    .0599549
              2017  |  -.0167714   .0885459    -0.19   0.850    -.1903181    .1567753
              _cons |  -1.950205   1.103531    -1.77   0.077    -4.113086    .2126767
      Instruments corresponding to the linear moment conditions:
       1, model(diff):
         L1.L2.rdint_log L2.L2.rdint_log L3.L2.rdint_log L4.L2.rdint_log
         L5.L2.rdint_log L1.pressure_sens L2.pressure_sens L3.pressure_sens
         L4.pressure_sens L5.pressure_sens L6.pressure_sens L7.pressure_sens
         L1.pressure_res L2.pressure_res L3.pressure_res L4.pressure_res
         L5.pressure_res L6.pressure_res L7.pressure_res L1.owner_share
         L2.owner_share L3.owner_share L4.owner_share L5.owner_share L6.owner_share
         L7.owner_share L1.inter1 L2.inter1 L3.inter1 L4.inter1 L5.inter1 L6.inter1
         L7.inter1 L1.inter2 L2.inter2 L3.inter2 L4.inter2 L5.inter2 L6.inter2
         L7.inter2 L1.log_employees L2.log_employees L3.log_employees
         L4.log_employees L5.log_employees L6.log_employees L7.log_employees
       2, model(diff):
         gii_score ln_age
       3, model(level):
         2013bn.year 2014.year 2015.year 2016.year 2017.year
       4, model(level):
      . estat serial
      Arellano-Bond test for autocorrelation of the first-differenced residuals
      H0: no autocorrelation of order 1:     z =  -13.6530   Prob > |z|  =    0.0000
      H0: no autocorrelation of order 2:     z =    1.2050   Prob > |z|  =    0.2282
      . estat overid
      Sargan-Hansen test of the overidentifying restrictions
      H0: overidentifying restrictions are valid
      2-step moment functions, 2-step weighting matrix       chi2(44)    =  117.8849
                                                             Prob > chi2 =    0.0000
      2-step moment functions, 3-step weighting matrix       chi2(44)    =  119.4523
                                                             Prob > chi2 =    0.0000


      • #4
        It seems odd to use the 2nd lag of the dependent variable as a regressor without the 1st lag.

        Other than that, it can be quite cumbersome to find out what causes the overidentification tests to reject the model. It could be a misclassification of regressors as predetermined that are actually endogenous. It could also be omitted variables, including lags of the regressors or interaction terms. Please see slides 90 onwards of my 2019 London Stata Conference presentation for a possible approach to model specification.

