Hansen test is missed after xtabond2 (collapse)

Alex Mai

Join Date: May 2016

Posts: 213
#1

Hansen test is missed after xtabond2 (collapse)

18 Jul 2017, 04:41

Dear all,

I am running a dynamic panel regression -xtabond2 L L.y x1 x2 x3, gmm(L, lag(2 3) collapse) iv(x1 x2 x3) twostep robust-. Because instruments outnumber groups, I add the -collapse- option. However, after adding collapse, Stata does not report Sargan test and Hansen test (i.e. Hansen test of overid. restrictions: chi2(-3)=1.45 Prob > chi2 = .).

So what is the reason to this problem? After collapse, there are 30 instruments and 83 groups. Before collapse, 90 instruments and 83 groups.

Btw, can the number of instruments equal the number of groups? Roodman (2007) highlights that No. iv must be smaller than No. group. But in another note, Elitza Mileva said that No. iv should be equal to or smaller than No. groups.

Thank you very much.

Last edited by Alex Mai; 18 Jul 2017, 04:48.
Tags: None
Alex Mai

Join Date: May 2016

Posts: 213
#2

19 Jul 2017, 09:27

Any answer is appreciated. Thank you.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#3

20 Jul 2017, 05:20

There is no way to answer your question without seeing the estimation output.

Originally posted by Alex Mai View Post

Btw, can the number of instruments equal the number of groups? Roodman (2007) highlights that No. iv must be smaller than No. group. But in another note, Elitza Mileva said that No. iv should be equal to or smaller than No. groups.

That is just a rule of thumb. Technically, you could even estimate models with more instruments than number of groups. But that is not recommended at all. In fact, you should try to stay considerably below the number of groups to avoid problems of instrument proliferation.

https://www.kripfganz.de/stata/
Comment

Alex Mai

Join Date: May 2016
Posts: 213

23 Jul 2017, 04:47

Originally posted by Sebastian Kripfganz View Post

There is no way to answer your question without seeing the estimation output.

That is just a rule of thumb. Technically, you could even estimate models with more instruments than number of groups. But that is not recommended at all. In fact, you should try to stay considerably below the number of groups to avoid problems of instrument proliferation.

Thank you very much. The following is the code and the Stata output (an example). After adding the collapse option to xtabond2, Stata does not report Sargan and Hansen tests. I treat the lagged dependent variable as the only endogenous variable in this dynamic panel database..

Code:

. xtabond2 y L.y v2 v200 v3 v55 v6 v20 v99 year i.year, gmm(y, lag(2 3) collapse) iv(v2 v200 v3 v55 v6 v20 v
> 99 year i.year) twostep robust
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: i                               Number of obs      =       490
Time variable : year                            Number of groups   =        36
Number of instruments = 28                      Obs per group: min =         1
Wald chi2(29) =   1441.98                                      avg =     13.61
Prob > chi2   =     0.000                                      max =        19
------------------------------------------------------------------------------
             |              Corrected
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           y |
         L1. |   .0886912   .0886118     1.00   0.317    -.0849848    .2623672
             |
          v2 |  -.2986242   .4813819    -0.62   0.535    -1.242115     .644867
        v200 |   25.42563   40.39917     0.63   0.529     -53.7553    104.6066
          v3 |  -3.399594   .6218488    -5.47   0.000    -4.618395   -2.180793
         v55 |  -7.634515   1.459569    -5.23   0.000    -10.49522   -4.773813
          v6 |  -.0416967    .096971    -0.43   0.667    -.2317564     .148363
         v20 |   .0011474   .0387089     0.03   0.976    -.0747206    .0770155
         v99 |   .1312001   .1048966     1.25   0.211    -.0743934    .3367936
        year |   3.082687   .5589801     5.51   0.000     1.987106    4.178268
             |
        year |
          4  |          0  (empty)
          5  |   56.57782    10.2297     5.53   0.000     36.52798    76.62766
          6  |   54.99653   9.597537     5.73   0.000      36.1857    73.80735
          7  |   48.08382   8.748754     5.50   0.000     30.93658    65.23106
          8  |   47.53848   8.724147     5.45   0.000     30.43947    64.63749
          9  |   48.38686   8.498198     5.69   0.000      31.7307    65.04302
         10  |   38.01139   6.367364     5.97   0.000     25.53158    50.49119
         11  |   37.19191   6.228746     5.97   0.000      24.9838    49.40003
         12  |    33.8011   6.064849     5.57   0.000     21.91421    45.68798
         13  |   34.93813   6.121565     5.71   0.000     22.94008    46.93617
         14  |   30.48996   5.344094     5.71   0.000     20.01573    40.96419
         15  |   29.01256   5.413869     5.36   0.000     18.40157    39.62355
         16  |   26.26756   4.589624     5.72   0.000     17.27206    35.26306
         17  |    14.6454   2.562544     5.72   0.000     9.622904    19.66789
         18  |          0  (omitted)
         19  |   17.48243   3.018042     5.79   0.000     11.56718    23.39768
         20  |   10.52432   2.112637     4.98   0.000     6.383626    14.66501
         21  |   5.464705   1.124788     4.86   0.000      3.26016    7.669249
         22  |    2.69289   .7705897     3.49   0.000     1.182562    4.203218
         23  |          0  (omitted)
             |
       _cons |          0  (omitted)
------------------------------------------------------------------------------
Instruments for first differences equation
  Standard
    D.(v2 v200 v3 v55 v6 v20 v99 year 4b.year 5.year 6.year 7.year 8.year
    9.year 10.year 11.year 12.year 13.year 14.year 15.year 16.year 17.year
    18.year 19.year 20.year 21.year 22.year 23.year)
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/3).y collapsed
Instruments for levels equation
  Standard
    v2 v200 v3 v55 v6 v20 v99 year 4b.year 5.year 6.year 7.year 8.year 9.year
    10.year 11.year 12.year 13.year 14.year 15.year 16.year 17.year 18.year
    19.year 20.year 21.year 22.year 23.year
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.y collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -3.27  Pr > z =  0.001
Arellano-Bond test for AR(2) in first differences: z =   1.28  Pr > z =  0.199
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(-2)   =   3.70  Prob > chi2 =      .
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(-2)   =   1.48  Prob > chi2 =      .
  (Robust, but weakened by many instruments.)

Last edited by Alex Mai; 23 Jul 2017, 05:17.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#5

23 Jul 2017, 11:01

You are using 28 instruments to estimate 28 coefficients. There are thus no overidentifying restrictions that could be tested.

A few additional comments:
You probably expect that the iv() option creates separate instruments for the first-differenced and the level equation, as indicated also by the list of instruments below the regression table. This is NOT the case. You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.

Your time trend (year) together with the full set of time dummies is perfectly collinear with the regression intercept. When using time dummies, there is no need to include a time trend. In fact, you should remove it.

The degrees of freedom of the overidentification tests are computed incorrectly by xtabond2 when you include time dummies with factor notation due to the empty and omitted categories. (This is pretty obvious here because it is not possible to have -2 degrees of freedom. It should be 0 here.)

See my comment in the Statalist topic on xtabond2 and deeper lags and the further links therein for details about the problem with the iv() option and the bug with the degrees of freedom for the overidentification tests when using time dummies.

https://www.kripfganz.de/stata/
1 like
Comment

Alex Mai

Join Date: May 2016
Posts: 213

24 Jul 2017, 03:51

Originally posted by Sebastian Kripfganz View Post

You are using 28 instruments to estimate 28 coefficients. There are thus no overidentifying restrictions that could be tested.

A few additional comments:

You probably expect that the iv() option creates separate instruments for the first-differenced and the level equation, as indicated also by the list of instruments below the regression table. This is NOT the case. You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.
Your time trend (year) together with the full set of time dummies is perfectly collinear with the regression intercept. When using time dummies, there is no need to include a time trend. In fact, you should remove it.
The degrees of freedom of the overidentification tests are computed incorrectly by xtabond2 when you include time dummies with factor notation due to the empty and omitted categories. (This is pretty obvious here because it is not possible to have -2 degrees of freedom. It should be 0 here.)

See my comment in the Statalist topic on xtabond2 and deeper lags and the further links therein for details about the problem with the iv() option and the bug with the degrees of freedom for the overidentification tests when using time dummies.

Thank you very much. But I am not sure about how to specify iv for eq(level) and iv for eq(diff) respectively. I tried the following two commands and Stata gives different results.
In the first command, I used iv(x1 x2 x3) iv(i.year, eq(level)), while in the second I used iv(x1 x2 x3, eq(level)) iv(x1 x2 x3, eq(diff)) iv(i.year, eq(level)). I think they should produce the same results, but actually the results are totally different (I skip regression tables).

1.

Code:

 xtabond2 y L.y v1 v2 v25 v3 v4 v21 v6 v19 v20 v39 i.year, gmm(y, lag(2 4)
>  collapse) iv(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39) iv(i.year, eq(level)) twostep robust
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.
Instruments for first differences equation
  Standard
    D.(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39)
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/4).y collapsed
Instruments for levels equation
  Standard
    4b.year 5.year 6.year 7.year 8.year 9.year 10.year 11.year 12.year 13.year
    14.year 15.year 16.year 17.year 18.year 19.year 20.year 21.year 22.year
    23.year
    v1 v2 v25 v3 v4 v21 v6 v19 v20 v39
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.y collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -4.07  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   1.01  Pr > z =  0.312
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(1)    =   6.16  Prob > chi2 =  0.013
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(1)    =   2.33  Prob > chi2 =  0.127
  (Robust, but weakened by many instruments.)

Code:

 xtabond2 y L.y v1 v2 v25 v3 v4 v21 v6 v19 v20 v39 i.year, gmm(y, lag(2 4)
>  collapse) iv(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39, eq(diff)) iv(v1 v2 v25 v3 v4
> v21 v6 v19 v20 v39, eq(level)) iv(i.year, eq(level)) twostep robust
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estimation.
  Difference-in-Sargan/Hansen statistics may be negative.
Instruments for first differences equation
  Standard
    D.(v1 v2 v25 v3 v4 v21 v6 v19 v20 v39)
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/4).y collapsed
Instruments for levels equation
  Standard
    4b.year 5.year 6.year 7.year 8.year 9.year 10.year 11.year 12.year 13.year
    14.year 15.year 16.year 17.year 18.year 19.year 20.year 21.year 22.year
    23.year
    v1 v2 v25 v3 v4 v21 v6 v19 v20 v39
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.y collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -4.09  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   1.40  Pr > z =  0.161
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(10)   =  52.85  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(10)   =  22.86  Prob > chi2 =  0.011
  (Robust, but weakened by many instruments.)

Last edited by Alex Mai; 24 Jul 2017, 03:56.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#7

24 Jul 2017, 04:04

Originally posted by Alex Mai View Post

Thank you very much. But I am not sure about how to specify iv for eq(level) and iv for eq(diff) respectively. I tried the following two commands and Stata gives different results.
In the first command, I used iv(x1 x2 x3) iv(i.year, eq(level)), while in the second I used iv(x1 x2 x3, eq(level)) iv(x1 x2 x3, eq(diff)) iv(i.year, eq(level)). I think they should produce the same results, but actually the results are totally different (I skip regression tables).

That is exactly the point I was making. The first specification is not doing what you (and most other users) think it does. Do not use it!
The second specification is better, but still remember my comment about the incorrect degrees of freedom (and therefore incorrect p-values) for the Sargan/Hansen tests when using the factor variable notation for the time dummies. It is better, although inconvenient, to specify the dummies one by one and to make sure that none of them is omitted. Alternatively, use the teffects option of my command xtseqreg. You can obtain the same estimation results as with xtabond2 but avoid the bug for the overidentification tests.

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#8

24 Jul 2017, 04:48

Originally posted by Sebastian Kripfganz View Post

That is exactly the point I was making. The first specification is not doing what you (and most other users) think it does. Do not use it!
The second specification is better, but still remember my comment about the incorrect degrees of freedom (and therefore incorrect p-values) for the Sargan/Hansen tests when using the factor variable notation for the time dummies. It is better, although inconvenient, to specify the dummies one by one and to make sure that none of them is omitted. Alternatively, use the teffects option of my command xtseqreg. You can obtain the same estimation results as with xtabond2 but avoid the bug for the overidentification tests.

Thank you! However, using iv(x1 x2 x3, eq(level)) iv(x1 x2 x3, eq(diff)) iv(i.year, eq(level) makes it difficult to pass Hansen test (p-value often close to zero). What is the reason?

Just one more question, I am not sure if I should use -small- option (T=20, N=90). Roodman (2007) mentioned that -small- is a standard practice, just like robust. But why -small- should be a standard practice?

Thank you very much again.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#9

24 Jul 2017, 06:43

It might just be that your instruments are invalid. In particular, you are placing the strong assumption on the variables x1 x2 x3 that they are uncorrelated both with the idiosyncratic and the unit-specific error component. There is no simple solution "to pass the Hansen test". It is always application and data specific.

I do not have any strong opinion on the small option.

https://www.kripfganz.de/stata/
1 like
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#10

26 Jul 2017, 04:13

Originally posted by Sebastian Kripfganz View Post

It might just be that your instruments are invalid. In particular, you are placing the strong assumption on the variables x1 x2 x3 that they are uncorrelated both with the idiosyncratic and the unit-specific error component. There is no simple solution "to pass the Hansen test". It is always application and data specific.

I do not have any strong opinion on the small option.

Thank you so much! Previously you mentioned that

Code:

You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.

. So in what situations can I simply use the default iv(x1 x2 x3) without specifying eq(level) and eq(diff)?

And shall I just put individual-invariant but time-variant categorical variables (e.g. i.x4) and dummy variable in both iv( , eq(level)) and iv( , eq(diff)), or only in iv( , eq(level)) like the case of i.year?

Last edited by Alex Mai; 26 Jul 2017, 04:29.
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#11

26 Jul 2017, 06:20

[QUOTE=Alex Mai;n1403826]

Thank you so much! Previously you mentioned that

Code:

You should ALWAYS specify the instruments separately yourself by using the suboptions equation(diff) and equation(level), respectively.

. So in what situations can I simply use the default iv(x1 x2 x3) without specifying eq(level) and eq(diff)?

And shall I just put individual-invariant but time-invariant categorical variables (e.g. i.x4) and dummy variable in both iv( , eq(level)) and iv( , eq(diff)), or only in iv( , eq(level)) like the case of i.year?
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#12

26 Jul 2017, 06:23

Originally posted by Sebastian Kripfganz View Post

It might just be that your instruments are invalid. In particular, you are placing the strong assumption on the variables x1 x2 x3 that they are uncorrelated both with the idiosyncratic and the unit-specific error component. There is no simple solution "to pass the Hansen test". It is always application and data specific.

I do not have any strong opinion on the small option.

Sorry, just to correct an error in the last message, individual-invariant and time-invariant categorical variables, not time-variant. Thank you!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#13

26 Jul 2017, 07:00

Originally posted by Alex Mai View Post

So in what situations can I simply use the default iv(x1 x2 x3) without specifying eq(level) and eq(diff)?

The only situation in which iv() can be safely used without the equation() suboption is in combination with the noleveleq option.

Originally posted by Alex Mai View Post

And shall I just put individual-invariant but time-variant categorical variables (e.g. i.x4) and dummy variable in both iv( , eq(level)) and iv( , eq(diff)), or only in iv( , eq(level)) like the case of i.year?

You should generally put individual-invariant but time-variant categorical variables into iv( , eq(level)) only, just like time dummies.

Edit: Just saw your second post. For individual-variant but time-invariant variables, you must put the instruments also into iv( , eq(level)). Remember that any instrument you specify this way must be uncorrelated with the unobserved time-invariant error component (the fixed effects).

Last edited by Sebastian Kripfganz; 26 Jul 2017, 07:04.

https://www.kripfganz.de/stata/
Comment

Announcement