Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hansen test is missed after xtabond2 (collapse)

    Dear all,

    I am running a dynamic panel regression -xtabond2 L L.y x1 x2 x3, gmm(L, lag(2 3) collapse) iv(x1 x2 x3) twostep robust-. Because instruments outnumber groups, I add the -collapse- option. However, after adding collapse, Stata does not report Sargan test and Hansen test (i.e. Hansen test of overid. restrictions: chi2(-3)=1.45 Prob > chi2 = .).

    So what is the reason to this problem? After collapse, there are 30 instruments and 83 groups. Before collapse, 90 instruments and 83 groups.

    Btw, can the number of instruments equal the number of groups? Roodman (2007) highlights that No. iv must be smaller than No. group. But in another note, Elitza Mileva said that No. iv should be equal to or smaller than No. groups.

    Thank you very much.
    Last edited by Alex Mai; 18 Jul 2017, 04:48.

  • #2
    Any answer is appreciated. Thank you.

    Comment


    • #3
      There is no way to answer your question without seeing the estimation output.

      Originally posted by Alex Mai View Post
      Btw, can the number of instruments equal the number of groups? Roodman (2007) highlights that No. iv must be smaller than No. group. But in another note, Elitza Mileva said that No. iv should be equal to or smaller than No. groups.
      That is just a rule of thumb. Technically, you could even estimate models with more instruments than number of groups. But that is not recommended at all. In fact, you should try to stay considerably below the number of groups to avoid problems of instrument proliferation.

      https://twitter.com/Kripfganz

      Comment


      • #4
        Originally posted by Sebastian Kripfganz View Post
        There is no way to answer your question without seeing the estimation output.


        That is just a rule of thumb. Technically, you could even estimate models with more instruments than number of groups. But that is not recommended at all. In fact, you should try to stay considerably below the number of groups to avoid problems of instrument proliferation.
        Thank you very much. The following is the code and the Stata output (an example). After adding the collapse option to xtabond2, Stata does not report Sargan and Hansen tests. I treat the lagged dependent variable as the only endogenous variable in this dynamic panel database..

        Code:
        . xtabond2 y L.y v2 v200 v3 v55 v6 v20 v99 year i.year, gmm(y, lag(2 3) collapse) iv(v2 v200 v3 v55 v6 v20 v
        > 99 year i.year) twostep robust
        Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
        Warning: Two-step estimated covariance matrix of moments is singular.
          Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
          Difference-in-Sargan/Hansen statistics may be negative.
        
        Dynamic panel-data estimation, two-step system GMM
        ------------------------------------------------------------------------------
        Group variable: i                               Number of obs      =       490
        Time variable : year                            Number of groups   =        36
        Number of instruments = 28                      Obs per group: min =         1
        Wald chi2(29) =   1441.98                                      avg =     13.61
        Prob > chi2   =     0.000                                      max =        19
        ------------------------------------------------------------------------------
                     |              Corrected
                   y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                   y |
                 L1. |   .0886912   .0886118     1.00   0.317    -.0849848    .2623672
                     |
                  v2 |  -.2986242   .4813819    -0.62   0.535    -1.242115     .644867
                v200 |   25.42563   40.39917     0.63   0.529     -53.7553    104.6066
                  v3 |  -3.399594   .6218488    -5.47   0.000    -4.618395   -2.180793
                 v55 |  -7.634515   1.459569    -5.23   0.000    -10.49522   -4.773813
                  v6 |  -.0416967    .096971    -0.43   0.667    -.2317564     .148363
                 v20 |   .0011474   .0387089     0.03   0.976    -.0747206    .0770155
                 v99 |   .1312001   .1048966     1.25   0.211    -.0743934    .3367936
                year |   3.082687   .5589801     5.51   0.000     1.987106    4.178268
                     |
                year |
                  4  |          0  (empty)
                  5  |   56.57782    10.2297     5.53   0.000     36.52798    76.62766
                  6  |   54.99653   9.597537     5.73   0.000      36.1857    73.80735
                  7  |   48.08382   8.748754     5.50   0.000     30.93658    65.23106
                  8  |   47.53848   8.724147     5.45   0.000     30.43947    64.63749
                  9  |   48.38686   8.498198     5.69   0.000      31.7307    65.04302
                 10  |   38.01139   6.367364     5.97   0.000     25.53158    50.49119
                 11  |   37.19191   6.228746     5.97   0.000      24.9838    49.40003
                 12  |    33.8011   6.064849     5.57   0.000     21.91421    45.68798
                 13  |   34.93813   6.121565     5.71   0.000     22.94008    46.93617
                 14  |   30.48996   5.344094     5.71   0.000     20.01573    40.96419
                 15  |   29.01256   5.413869     5.36   0.000     18.40157    39.62355
                 16  |   26.26756   4.589624     5.72   0.000     17.27206    35.26306
                 17  |    14.6454   2.562544     5.72   0.000     9.622904    19.66789
                 18  |          0  (omitted)
                 19  |   17.48243   3.018042     5.79   0.000     11.56718    23.39768
                 20  |   10.52432   2.112637     4.98   0.000     6.383626    14.66501
                 21  |   5.464705   1.124788     4.86   0.000      3.26016    7.669249
                 22  |    2.69289   .7705897     3.49   0.000     1.182562    4.203218
                 23  |          0  (omitted)
                     |
               _cons |          0  (omitted)
        ------------------------------------------------------------------------------
        Instruments for first differences equation
          Standard
            D.(v2 v200 v3 v55 v6 v20 v99 year 4b.year 5.year 6.year 7.year 8.year
            9.year 10.year 11.year 12.year 13.year 14.year 15.year 16.year 17.year
            18.year 19.year 20.year 21.year 22.year 23.year)
          GMM-type (missing=0, separate instruments for each period unless collapsed)
            L(2/3).y collapsed
        Instruments for levels equation
          Standard
            v2 v200 v3 v55 v6 v20 v99 year 4b.year 5.year 6.year 7.year 8.year 9.year
            10.year 11.year 12.year 13.year 14.year 15.year 16.year 17.year 18.year
            19.year 20.year 21.year 22.year 23.year
            _cons
          GMM-type (missing=0, separate instruments for each period unless collapsed)
            DL.y collapsed
        ------------------------------------------------------------------------------
        Arellano-Bond test for AR(1) in first differences: z =  -3.27  Pr > z =  0.001
        Arellano-Bond test for AR(2) in first differences: z =   1.28  Pr > z =  0.199
        ------------------------------------------------------------------------------
        Sargan test of overid. restrictions: chi2(-2)   =   3.70  Prob > chi2 =      .
          (Not robust, but not weakened by many instruments.)
        Hansen test of overid. restrictions: chi2(-2)   =   1.48  Prob > chi2 =      .
          (Robust, but weakened by many instruments.)
        Last edited by Alex Mai; 23 Jul 2017, 05:17.

        Comment


        • #5
          You are using 28 instruments to estimate 28 coefficients. There are thus no overidentifying restrictions that could be tested.

          A few additional comments:
          1. You probably expect that the iv() option creates separate instruments for the first-differenced and the level equation, as indicated also by the list of instruments below the regression table. This is NOT the case. You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.
          2. Your time trend (year) together with the full set of time dummies is perfectly collinear with the regression intercept. When using time dummies, there is no need to include a time trend. In fact, you should remove it.
          3. The degrees of freedom of the overidentification tests are computed incorrectly by xtabond2 when you include time dummies with factor notation due to the empty and omitted categories. (This is pretty obvious here because it is not possible to have -2 degrees of freedom. It should be 0 here.)

          See my comment in the Statalist topic on xtabond2 and deeper lags and the further links therein for details about the problem with the iv() option and the bug with the degrees of freedom for the overidentification tests when using time dummies.
          https://twitter.com/Kripfganz

          Comment


          • #6
            Originally posted by Sebastian Kripfganz View Post
            You are using 28 instruments to estimate 28 coefficients. There are thus no overidentifying restrictions that could be tested.

            A few additional comments:
            1. You probably expect that the iv() option creates separate instruments for the first-differenced and the level equation, as indicated also by the list of instruments below the regression table. This is NOT the case. You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.
            2. Your time trend (year) together with the full set of time dummies is perfectly collinear with the regression intercept. When using time dummies, there is no need to include a time trend. In fact, you should remove it.
            3. The degrees of freedom of the overidentification tests are computed incorrectly by xtabond2 when you include time dummies with factor notation due to the empty and omitted categories. (This is pretty obvious here because it is not possible to have -2 degrees of freedom. It should be 0 here.)
            See my comment in the Statalist topic on xtabond2 and deeper lags and the further links therein for details about the problem with the iv() option and the bug with the degrees of freedom for the overidentification tests when using time dummies.
            Thank you very much. But I am not sure about how to specify iv for eq(level) and iv for eq(diff) respectively. I tried the following two commands and Stata gives different results.
            In the first command, I used iv(x1 x2 x3) iv(i.year, eq(level)), while in the second I used iv(x1 x2 x3, eq(level)) iv(x1 x2 x3, eq(diff)) iv(i.year, eq(level)). I think they should produce the same results, but actually the results are totally different (I skip regression tables).

            1.
            Code:
             xtabond2 y L.y v1 v2 v25 v3 v4 v21 v6 v19 v20 v39 i.year, gmm(y, lag(2 4)
            >  collapse) iv(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39) iv(i.year, eq(level)) twostep robust
            Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
            Warning: Two-step estimated covariance matrix of moments is singular.
              Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
              Difference-in-Sargan/Hansen statistics may be negative.
            Instruments for first differences equation
              Standard
                D.(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39)
              GMM-type (missing=0, separate instruments for each period unless collapsed)
                L(2/4).y collapsed
            Instruments for levels equation
              Standard
                4b.year 5.year 6.year 7.year 8.year 9.year 10.year 11.year 12.year 13.year
                14.year 15.year 16.year 17.year 18.year 19.year 20.year 21.year 22.year
                23.year
                v1 v2 v25 v3 v4 v21 v6 v19 v20 v39
                _cons
              GMM-type (missing=0, separate instruments for each period unless collapsed)
                DL.y collapsed
            ------------------------------------------------------------------------------
            Arellano-Bond test for AR(1) in first differences: z =  -4.07  Pr > z =  0.000
            Arellano-Bond test for AR(2) in first differences: z =   1.01  Pr > z =  0.312
            ------------------------------------------------------------------------------
            Sargan test of overid. restrictions: chi2(1)    =   6.16  Prob > chi2 =  0.013
              (Not robust, but not weakened by many instruments.)
            Hansen test of overid. restrictions: chi2(1)    =   2.33  Prob > chi2 =  0.127
              (Robust, but weakened by many instruments.)

            2.
            Code:
             xtabond2 y L.y v1 v2 v25 v3 v4 v21 v6 v19 v20 v39 i.year, gmm(y, lag(2 4)
            >  collapse) iv(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39, eq(diff)) iv(v1 v2 v25 v3 v4
            > v21 v6 v19 v20 v39, eq(level)) iv(i.year, eq(level)) twostep robust
            Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
            Warning: Two-step estimated covariance matrix of moments is singular.
              Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
              Difference-in-Sargan/Hansen statistics may be negative.
            Instruments for first differences equation
              Standard
                D.(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39)
              GMM-type (missing=0, separate instruments for each period unless collapsed)
                L(2/4).y collapsed
            Instruments for levels equation
              Standard
                4b.year 5.year 6.year 7.year 8.year 9.year 10.year 11.year 12.year 13.year
                14.year 15.year 16.year 17.year 18.year 19.year 20.year 21.year 22.year
                23.year
                v1 v2 v25 v3 v4 v21 v6 v19 v20 v39
                _cons
              GMM-type (missing=0, separate instruments for each period unless collapsed)
                DL.y collapsed
            ------------------------------------------------------------------------------
            Arellano-Bond test for AR(1) in first differences: z =  -4.09  Pr > z =  0.000
            Arellano-Bond test for AR(2) in first differences: z =   1.40  Pr > z =  0.161
            ------------------------------------------------------------------------------
            Sargan test of overid. restrictions: chi2(10)   =  52.85  Prob > chi2 =  0.000
              (Not robust, but not weakened by many instruments.)
            Hansen test of overid. restrictions: chi2(10)   =  22.86  Prob > chi2 =  0.011
              (Robust, but weakened by many instruments.)
            Last edited by Alex Mai; 24 Jul 2017, 03:56.

            Comment


            • #7
              Originally posted by Alex Mai View Post
              Thank you very much. But I am not sure about how to specify iv for eq(level) and iv for eq(diff) respectively. I tried the following two commands and Stata gives different results.
              In the first command, I used iv(x1 x2 x3) iv(i.year, eq(level)), while in the second I used iv(x1 x2 x3, eq(level)) iv(x1 x2 x3, eq(diff)) iv(i.year, eq(level)). I think they should produce the same results, but actually the results are totally different (I skip regression tables).
              That is exactly the point I was making. The first specification is not doing what you (and most other users) think it does. Do not use it!
              The second specification is better, but still remember my comment about the incorrect degrees of freedom (and therefore incorrect p-values) for the Sargan/Hansen tests when using the factor variable notation for the time dummies. It is better, although inconvenient, to specify the dummies one by one and to make sure that none of them is omitted. Alternatively, use the teffects option of my command xtseqreg. You can obtain the same estimation results as with xtabond2 but avoid the bug for the overidentification tests.
              https://twitter.com/Kripfganz

              Comment


              • #8
                Originally posted by Sebastian Kripfganz View Post
                That is exactly the point I was making. The first specification is not doing what you (and most other users) think it does. Do not use it!
                The second specification is better, but still remember my comment about the incorrect degrees of freedom (and therefore incorrect p-values) for the Sargan/Hansen tests when using the factor variable notation for the time dummies. It is better, although inconvenient, to specify the dummies one by one and to make sure that none of them is omitted. Alternatively, use the teffects option of my command xtseqreg. You can obtain the same estimation results as with xtabond2 but avoid the bug for the overidentification tests.
                Thank you! However, using iv(x1 x2 x3, eq(level)) iv(x1 x2 x3, eq(diff)) iv(i.year, eq(level) makes it difficult to pass Hansen test (p-value often close to zero). What is the reason?

                Just one more question, I am not sure if I should use -small- option (T=20, N=90). Roodman (2007) mentioned that -small- is a standard practice, just like robust. But why -small- should be a standard practice?

                Thank you very much again.

                Comment


                • #9
                  It might just be that your instruments are invalid. In particular, you are placing the strong assumption on the variables x1 x2 x3 that they are uncorrelated both with the idiosyncratic and the unit-specific error component. There is no simple solution "to pass the Hansen test". It is always application and data specific.

                  I do not have any strong opinion on the small option.
                  https://twitter.com/Kripfganz

                  Comment


                  • #10
                    Originally posted by Sebastian Kripfganz View Post
                    It might just be that your instruments are invalid. In particular, you are placing the strong assumption on the variables x1 x2 x3 that they are uncorrelated both with the idiosyncratic and the unit-specific error component. There is no simple solution "to pass the Hansen test". It is always application and data specific.

                    I do not have any strong opinion on the small option.
                    Thank you so much! Previously you mentioned that
                    Code:
                    You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.
                    . So in what situations can I simply use the default iv(x1 x2 x3) without specifying eq(level) and eq(diff)?

                    And shall I just put individual-invariant but time-variant categorical variables (e.g. i.x4) and dummy variable in both iv( , eq(level)) and iv( , eq(diff)), or only in iv( , eq(level)) like the case of i.year?
                    Last edited by Alex Mai; 26 Jul 2017, 04:29.

                    Comment


                    • #11
                      [QUOTE=Alex Mai;n1403826]

                      Thank you so much! Previously you mentioned that
                      Code:
                      You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.
                      . So in what situations can I simply use the default iv(x1 x2 x3) without specifying eq(level) and eq(diff)?

                      And shall I just put individual-invariant but time-invariant categorical variables (e.g. i.x4) and dummy variable in both iv( , eq(level)) and iv( , eq(diff)), or only in iv( , eq(level)) like the case of i.year?

                      Comment


                      • #12
                        Originally posted by Sebastian Kripfganz View Post
                        It might just be that your instruments are invalid. In particular, you are placing the strong assumption on the variables x1 x2 x3 that they are uncorrelated both with the idiosyncratic and the unit-specific error component. There is no simple solution "to pass the Hansen test". It is always application and data specific.

                        I do not have any strong opinion on the small option.
                        Sorry, just to correct an error in the last message, individual-invariant and time-invariant categorical variables, not time-variant. Thank you!

                        Comment


                        • #13
                          Originally posted by Alex Mai View Post
                          So in what situations can I simply use the default iv(x1 x2 x3) without specifying eq(level) and eq(diff)?
                          The only situation in which iv() can be safely used without the equation() suboption is in combination with the noleveleq option.

                          Originally posted by Alex Mai View Post
                          And shall I just put individual-invariant but time-variant categorical variables (e.g. i.x4) and dummy variable in both iv( , eq(level)) and iv( , eq(diff)), or only in iv( , eq(level)) like the case of i.year?
                          You should generally put individual-invariant but time-variant categorical variables into iv( , eq(level)) only, just like time dummies.

                          Edit: Just saw your second post. For individual-variant but time-invariant variables, you must put the instruments also into iv( , eq(level)). Remember that any instrument you specify this way must be uncorrelated with the unobserved time-invariant error component (the fixed effects).
                          Last edited by Sebastian Kripfganz; 26 Jul 2017, 07:04.
                          https://twitter.com/Kripfganz

                          Comment

                          Working...
                          X