Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-Step System-GMM vs simple Fixed Effects Regression

    Hello together,

    I am currently working on my thesis and was wondering if my current use of a Two-Step System-GMM is useful at all or if a plain-vanilla FE Regression will do the job.

    In Roodman (2009) it is often mentioned that "xtabond2" should be used for small T and large N datasets. I am wondering when a paneldata sets is considered to have a too large T and too small N?

    My current datasets consists of over 170.000 observation from 18.000 companies over 30 years. As one can see, its an (heavily) unbalanced dataset.

    You can see my results attached if this is of any use.

    Code:
    Dynamic panel-data estimation, two-step system GMM
    ------------------------------------------------------------------------------
    Group variable: gvkey                           Number of obs      =    111060
    Time variable : year                            Number of groups   =     18281
    Number of instruments = 448                     Obs per group: min =         1
    F(14, 18280)  =  14561.00                                      avg =      6.08
    Prob > F      =     0.000                                      max =        29
    -------------------------------------------------------------------------------------
                        |              Corrected
                    COE | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    --------------------+----------------------------------------------------------------
                    COE |
                    L1. |   .0746159   .0103966     7.18   0.000     .0542377    .0949942
                        |
             numest_log |   .0048291   .0004184    11.54   0.000     .0040091    .0056492
           eps_var_log2 |   .0088655   .0003942    22.49   0.000     .0080929    .0096382
                log_bmr |   .0166525   .0004454    37.39   0.000     .0157795    .0175254
                 mv_log |  -.0102536   .0002588   -39.62   0.000     -.010761   -.0097463
                   BETA |   .0036052   .0002894    12.46   0.000      .003038    .0041724
        financial_dummy |   .0051529   .0009553     5.39   0.000     .0032805    .0070253
           health_dummy |  -.0046829   .0009418    -4.97   0.000    -.0065289   -.0028369
       industrial_dummy |   .0024668   .0008792     2.81   0.005     .0007435      .00419
           it_tel_dummy |  -.0008938   .0009191    -0.97   0.331    -.0026952    .0009077
          oil_gas_dummy |   .0156918   .0018826     8.34   0.000     .0120016    .0193819
        materials_dummy |   .0122339   .0012172    10.05   0.000      .009848    .0146197
    communication_dummy |   .0034575   .0014513     2.38   0.017     .0006128    .0063021
          utility_dummy |   .0033903   .0014856     2.28   0.022     .0004784    .0063023
                  _cons |   .1972499   .0026227    75.21   0.000     .1921092    .2023907
    -------------------------------------------------------------------------------------
    Instruments for orthogonal deviations equation
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(1/29).L.COE
    Instruments for levels equation
      Standard
        numest_log eps_var_log2 log_bmr mv_log BETA financial_dummy health_dummy
        industrial_dummy it_tel_dummy oil_gas_dummy materials_dummy
        communication_dummy utility_dummy
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        D.L.COE
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -24.06  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   0.58  Pr > z =  0.565
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(433)  =3464.68  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(433)  =1785.01  Prob > chi2 =  0.000
      (Robust, but weakened by many instruments.)
    Could you give me an indication if a Two-Step GMM is of any use in this setting or whether a plain FE regression can also do the job?

    Have a nice day! :-)
    Last edited by Darian Mistoha; 19 Aug 2024, 12:10.

  • #2
    On average, you have T=6. This is definitely small, especially relative to N=18281. There is no fixed threshold for T being considered small or large.

    The plain FE estimator is biased and inconsistent in this case because of the dynamic nature of the model (lagged dependent variable => Nickell bias). You might find the following presentation useful:
    https://www.kripfganz.de/stata/

    Comment


    • #3
      Originally posted by Sebastian Kripfganz View Post
      On average, you have T=6. This is definitely small, especially relative to N=18281. There is no fixed threshold for T being considered small or large.

      The plain FE estimator is biased and inconsistent in this case because of the dynamic nature of the model (lagged dependent variable => Nickell bias). You might find the following presentation useful:
      Hello Mr Kripfganz,

      first of all, thank you very much for your answer and presentation.

      I hope it is okay to ask you a follow-up question:

      It seems like my AR2 value is not significant, so that second order serial correlation is not a major problem in my model, which seems very good. However, it is mentioned in Roodman (2009) that the p-values in the Hansen-Test should be around 0.05 to 0.25, thus making my p-values (0.000) a little "too good to be true". Thus, I am a afraid, that my model might be flawed. I already saw you answering a different question, suggesting that one could instead use a Difference-GMM model.Unfortunately, my models still "suffers" from p-values at this level when applying a Difference-GMM model. Could you maybe give me a suggestions on how to deal with this sort of problem? I am just not sure if going on with these p-values is a good approach.

      Comment


      • #4
        The null of the Hansen test is that the overidentifying restrictions are valid, hence your p-value of 0.000 is bad news in this regard. Roodman's point was that this p-value should be far above the conventional level for significance, as you want to be statistically confident to not reject the null of validity. Then again, Roodman points out that you can work your way up to a p-value of 1.000 by adding more instruments.
        Try cutting down the number of lags used as instruments and see what happens.

        Comment


        • #5
          You have not shown us your command line, which makes it a bit difficult to give a helpful answer. In addition to Andreas' good comment, I notice that you seem to have specified most instruments as standard instruments for the level model. This requires that all those instruments are uncorrelated with the unobserved group-specific effects, which is akin to a "random-effects" assumption and often hard to justify.
          https://www.kripfganz.de/stata/

          Comment


          • #6
            Originally posted by Andreas Backhaus View Post
            The null of the Hansen test is that the overidentifying restrictions are valid, hence your p-value of 0.000 is bad news in this regard. Roodman's point was that this p-value should be far above the conventional level for significance, as you want to be statistically confident to not reject the null of validity. Then again, Roodman points out that you can work your way up to a p-value of 1.000 by adding more instruments.
            Try cutting down the number of lags used as instruments and see what happens.
            Thank you Andreas and Sebastian for your really helpful comments! Cutting down the number of lags acutally did the job if anyone here runs into a similar issue. :-)

            Comment

            Working...
            X