Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing instruments in GMM estimations using -xtabond2-

    Dear Statalisters,

    My model tries to estimate bank risk using different bank-specific and macroeconomic variables. Given the persistence of risk and endogeneity issues of bank-specific variables, documented in literature, I have used a two-step system GMM estimation. My variables are as follows.

    Dependent: StrScore
    Endogenous Regressors: NIM, CRAR, ContLiab, CorpLoan, OpExpOpRev, ROA
    Strictly Exogenous Regressors: PCR, Size, Pub_Dummy ("0" if Private, "1" if Public), GDPGr, GsecYld, CMR, EPUInd, CPInfl, ExcUSD and Year dummies

    I attach below the code and results of one of my many trials and seek your suggestions in clarifying the doubts.

    Code:
     xtabond2 ln_StrsScore L.ln_StrsScore L.Pub_Dummy L.CRAR L.GNPA L.PCR L.NIM L.CorpLoan L.ContLiab L.OpExpOpRev L.Size
    >  L.ROA GDPG GsecYld CMR EPUInd CPInfl ExcUSD, gmmstyle(ln_StrsScore, lag(2 4) collapse) gmmstyle(NIM CRAR ContLiab C
    > orpLoan OpExpOpRev L.ROA, lag(2 3) collapse) ivstyle(Year2-Year18 L.Pub_Dummy L.PCR L.Size) twostep robust
    Favoring speed over space. To switch, type or click on mata: mata set matafavor space, perm.
    Warning: Number of instruments may be large relative to number of observations.
    Warning: Two-step estimated covariance matrix of moments is singular.
      Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
      Difference-in-Sargan/Hansen statistics may be negative.
    
    Dynamic panel-data estimation, two-step system GMM
    ------------------------------------------------------------------------------
    Group variable: BankID                          Number of obs      =       643
    Time variable : Year                            Number of groups   =        39
    Number of instruments = 42                      Obs per group: min =        14
    Wald chi2(17) =  2.30e+06                                      avg =     16.49
    Prob > chi2   =     0.000                                      max =        17
    ------------------------------------------------------------------------------
                 |              Corrected
    ln_StrsScore | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
    ln_StrsScore |
             L1. |    .641975   .0833203     7.70   0.000     .4786701    .8052798
                 |
       Pub_Dummy |
             L1. |   .0499874   .0425268     1.18   0.240    -.0333635    .1333384
                 |
            CRAR |
             L1. |  -.1038596   .3221506    -0.32   0.747    -.7352632    .5275441
                 |
            GNPA |
             L1. |   .5465262   .6220631     0.88   0.380     -.672695    1.765747
                 |
             PCR |
             L1. |    .091487   .0414615     2.21   0.027     .0102239      .17275
                 |
             NIM |
             L1. |  -7.446248   2.341253    -3.18   0.001    -12.03502   -2.857477
                 |
        CorpLoan |
             L1. |  -.2940318   .2343635    -1.25   0.210    -.7533757    .1653122
                 |
        ContLiab |
             L1. |   .0461053   .0245001     1.88   0.060    -.0019139    .0941245
                 |
      OpExpOpRev |
             L1. |   1.400362   .2287579     6.12   0.000     .9520049    1.848719
                 |
            Size |
             L1. |   .2713295   .0970137     2.80   0.005     .0811861    .4614729
                 |
             ROA |
             L1. |   8.974446   3.907885     2.30   0.022     1.315132    16.63376
                 |
           GDPGr |  -.6154063   .3196053    -1.93   0.054    -1.241821    .0110085
         GsecYld |   11.43682   1.912719     5.98   0.000     7.687965    15.18568
             CMR |  -1.021948   .9621678    -1.06   0.288    -2.907762     .863866
          EPUInd |  -.0001234   .0003283    -0.38   0.707    -.0007669    .0005201
          CPInfl |   .5629143   .4734788     1.19   0.234     -.365087    1.490916
          ExcUSD |  -.0003851   .0018006    -0.21   0.831    -.0039141     .003144
           _cons |  -1.791809   .4441954    -4.03   0.000    -2.662416   -.9212017
    ------------------------------------------------------------------------------
    Instruments for first differences equation
      Standard
        D.(Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10 Year11 Year12
        Year13 Year14 Year15 Year16 Year17 Year18 L.Pub_Dummy L.PCR L.Size)
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(2/3).(NIM CRAR ContLiab CorpLoan OpExpOpRev L.ROA) collapsed
        L(2/4).ln_StrsScore collapsed
    Instruments for levels equation
      Standard
        Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10 Year11 Year12
        Year13 Year14 Year15 Year16 Year17 Year18 L.Pub_Dummy L.PCR L.Size
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        DL.(NIM CRAR ContLiab CorpLoan OpExpOpRev L.ROA) collapsed
        DL.ln_StrsScore collapsed
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -3.33  Pr > z =  0.001
    Arellano-Bond test for AR(2) in first differences: z =  -1.46  Pr > z =  0.144
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(24)   =  75.30  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(24)   =  26.90  Prob > chi2 =  0.309
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(17)   =  21.02  Prob > chi2 =  0.226
        Difference (null H = exogenous): chi2(7)    =   5.88  Prob > chi2 =  0.554
      gmm(ln_StrsScore, collapse lag(2 4))
        Hansen test excluding group:     chi2(20)   =  24.28  Prob > chi2 =  0.230
        Difference (null H = exogenous): chi2(4)    =   2.61  Prob > chi2 =  0.625
      gmm(NIM CRAR ContLiab CorpLoan OpExpOpRev L.ROA, collapse lag(2 3))
        Hansen test excluding group:     chi2(6)    =  16.51  Prob > chi2 =  0.011
        Difference (null H = exogenous): chi2(18)   =  10.38  Prob > chi2 =  0.919
      iv(Year2 Year3 Year4 Year5 Year6 Year7 Year8 Year9 Year10 Year11 Year12 Year13 Year14 Year15 Year16 Year17 Year18 L.
    > Pub_Dummy L.PCR L.Size)
        Hansen test excluding group:     chi2(4)    =   4.54  Prob > chi2 =  0.337
        Difference (null H = exogenous): chi2(20)   =  22.35  Prob > chi2 =  0.322
    Based on my very minimal understanding of the GMM estimation problem, gathered mostly from Roodman(2009) and various statalist discussions, I have the following questions.

    1. Is the method, how I've specified lags under gmm() and iv() options looks correct ?
    2. Roodman advises reporting number of instruments used, with number of instruments should be less than number of groups. In general, is the number of instruments being more than the number of groups (42 no. of instruments compared to 39 groups) renders the estimation invalid, if all other tests like AR(2), Hansen p-value for overidentifying restriction and Differenec-in-Hansen tests are of complying ?
    3. Is it compulsory to include all strictly exogenous variables under iv() instruments ?
    4. In one of the discussions Sebastian Kripfganz states that, lower p-value of AR(2) test is potentially concerning. In my case, how should I interpret the AR(2) p-value of 0.144?
    5. Hansen test should be the preferred choice for testing overidentifying restrictions, if system GMM with robust standard errors and two-step estimation is employed and the assumptions of homoskedasticity and absence of serial autocorrelation are relaxed. In that sense, should I consider the hansen test p-value of 0.309 (breaching the upper limit of 0.25) problematic ?
    6. Hansen test excluding group for gmm() instruments (in bold) rejects null with p-value of 0.011. How does it impact the estimation process, given my AR(2) is ok?
    7. I understand that Difference-in-Hansen test would not be reported for a group of instruments, if excluding those instruments would result in the number of instruments falling below the number of regressors (under-identified model). In one of my attempts, Difference-in-Hansen test is not reported for iv() instruments, while they are reported for gmm() instruments and don't reject the null hypothesis. Should it be a cause of concern?

    As an aside, can you explain how STATA counts the number of instruments, e.g. 42 in this case?

    Sorry for posting such a lengthy array of questions. Thanks for patient reading. I'd really appreciate any suggestions and clarifications coming my way.

    Thanks
    pankaj

  • #2
    Someone kindly respond to my querries. I am struggling to get my doubts clarified. It would really be helpful to have any suggestions to clarify my doubts.

    Thanks in advance
    pankaj

    Comment


    • #3
      Dear Statalisters,

      It is really discouraging to see no responses. I being a novice, find the discussions and suggestions here, more useful from a practical viewpoint and they have contributed immensely to shape what bare minimum understanding I have got, thus far.

      I think something is wrong with the way how I state my questions/ issues. Or may be the issues raised are far too trivial to deserve an in-depth investigation. I would really appreciate suggestions to improve quality of my posting which can enable me to get better responses to my querries, going forward. For the time being, I would requests the experts on the forum to kindly take a look at my questions. I'm really struck with them and it is hard to find out textbooks which address such practical concerns, as lucidly as is done on this forum.

      Thanks in advance.
      pankaj

      Comment


      • #4
        You generally improve your chances for a response by only posting a small number of concise questions at a time. Many or lengthy questions deter people from answering because they cannot afford to spend so much time at once.

        A few brief answers:
        1. I generally advise against using the iv() option without the eq() suboption. Note that iv(varlist) is not the same as specifying both iv(varlist, eq(diff)) iv(varlist, eq(level)). If this is puzzling, then you probably do not want to use iv() without the eq() suboption. It appears arbitray that you have chosen a maximum lag order of 4 for some gmm() instruments but 3 for others. This is difficult to justify and gives the impression of specification hacking. There are some variables (e.g., GDPGr) that you have not explicitly instrumented. This can be valid but would generally require justification; there is the risk that the remaining instruments are weak for that regressor.
        2. There is no fixed threshold. In my view, the number of instruments should be considerably smaller than the number of groups. 39 groups is very small anyway, which makes it difficult to expect reliable/robust estimates from such a GMM regression.
        3. To specify a variable in iv() it needs to satisfy more than just strict exogeneity (with respect to the idiosyncratic error component). It also needs to satisfy exogeneity with respect to the group-specific error component. This is akin to a "random-effects" assumption, which can be difficult to justify with this kind of data.
        4. Strictly speaking, a p-value of 0.144 is larger than the usual significance levels and might therefore be fine. However, it does not provide much confidence. If the null hypothesis of no second-order correlation is indeed true, you would only expect to see such a result in 14.4% of the cases.
        5. If everything else is fine, a Hansen p-value of 0.309 is no problem at all. However, as mentioned above, you do have only a small number of groups and a relatively large number of instruments, which makes those test results very unreliable.
        6. The Hansen test excluding those instruments is probably not very informative because the estimation is likely to suffer from a weak-instruments problem when those instruments are excluded.
        7. Not sure what that means.
        Counting the instruments is easier when you use my xtdpdgmm command as an alternative to xtabond2. The non-redundant instruments are then listed below the regression output.

        More on GMM estimation of linear dynamic panel data models:
        https://www.kripfganz.de/stata/

        Comment


        • #5
          Thanks a lot, Professor Sebastian Kripfganz. I really appreciate your suggestion regarding posting of querries. Going ahead, I'd be more careful about the number and precision of questions. May be a little bit of impatience crept in, for which I'm sorry. Your responses will definitely strengthen my understanding of GMM estimation.

          You have mentioned GMM estimates to be unreliable with such a small number of groups (39). The number of groups can't be increased drastically in my study. Under such circumstances, what would you suggest as a better and robust alternative to GMM estimation, that accounts for endogeneity of regressors as well as dynamic nature of dependent variable?

          Many thanks
          pankaj

          Comment


          • #6
            I would recommend to keep the number of instruments as small as possible and to restrict yourself to the one-step estimator. The two-estimator yields an asymptotic efficiency approvement, but 39 groups is a far away from asymptopia. The cost for estimating the second-step weighting matrix is just too high.

            You might have to accept that test statistics are not very reliable with your data. You can still run your regression but should be careful not to put too much emphasis on the test results.

            There is no panacea to the problem of having insufficient data. There are just some things you cannot do or rely on the same way as if you had lots of data.
            https://www.kripfganz.de/stata/

            Comment


            • #7
              I convey my most sincere appreciation for your valuable suggestions, Professor Sebastian. Your inputs have been of tremendous help for me.

              Massive thanks and regards
              pankaj

              Comment

              Working...
              X