Which test to see in Difference-in Hansen test, excluding or difference.

Alex Mai

Join Date: May 2016

Posts: 213
#1

Which test to see in Difference-in Hansen test, excluding or difference.

02 Apr 2018, 03:52

Dear Statalists,

I read from Roodman (2007) that one should report the difference-in-hansen test for the validity and exogeneity of subset of instruments (despite that many published studies do not report them). However, I am not sure which of the two sub tests under difference-in-hansen (Hansen Excluding and Difference) I should report. some papers report both of them, while some only report one.

"Hansen Excluding Group" examines the validity of the model without the specified set of instruments (the set of instruments specified in each sub-heading, such as iv(x2 x3)), and the "Difference" test examines the validity of the specified set of instruments by computing the difference between the two Hansen J statistics with and without this set of instruments. Is this understanding correct?

Fo instance,

Code:

Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(4) = 4.06 Prob > chi2 = 0.397 Difference (null H = exogenous): chi2(2) = 1.41 Prob > chi2 = 0.494 gmm(y, collapse lag(2 4)) Hansen test excluding group: chi2(2) = 4.33 Prob > chi2 = 0.115 Difference (null H = exogenous): chi2(4) = 1.14 Prob > chi2 = 0.887 gmm(x1, collapse lag(2 5)) Hansen test excluding group: chi2(1) = 0.02 Prob > chi2 = 0.884 Difference (null H = exogenous): chi2(5) = 5.45 Prob > chi2 = 0.363 iv(x2 x3, eq(level)) Hansen test excluding group: chi2(4) = 3.62 Prob > chi2 = 0.459 Difference (null H = exogenous): chi2(2) = 1.85 Prob > chi2 = 0.397

Should I report both of the two sub tests or only the Difference test? and is it necessary to report all the four sets of difference-in-hansen tests (GMM instruments for levels, gmm (y), gmm(x1), and iv(x2 x3))?

Thank you!

Last edited by Alex Mai; 02 Apr 2018, 03:59.
Tags: None
Alex Mai

Join Date: May 2016

Posts: 213
#2

03 Apr 2018, 04:26

Any suggestions would be really appreciated! Many thanks.
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#3

04 Apr 2018, 07:35

Originally posted by Alex Mai View Post

Any suggestions would be really appreciated! Many thanks.

Hope any suggestions! Thank you!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#4

04 Apr 2018, 10:03

If the model without the additional instruments is correctly specified (i.e. the Hansen test excluding this group of instruments does not reject the null hypothesis), then the difference-in-Hansen test could be interpretated as a test for the validity of the additional instruments. In that regard, your understanding is correct.

As to which test results to report, it really depends. You certainly want to report the Hansen test for the full model. On top of that, it makes sense to report difference-in-Hansen tests for particular instruments if their inclusion requires particular justification. For example, if the Arellano-Bond AR(2) test does not reject the null hypothesis of no second-order serial correlation of the first-differenced errors, then you usually need not separately justify the lagged levels of the dependent variable as instruments for the first-differenced model. In contrast, the difference-in-Hansen test for the level instruments is informative because it helps to evaluate whether the Blundell-Bond mean stationarity assumption might be violated.

For example, you could report the Hansen test for the model with the instruments for the first-differenced model only, the Hansen test for the full model, and the respective difference-in-Hansen test. The Hansen test for the first-differenced model tells you something whether your model is dynamically complete (because this implies whether those instruments are valid). The difference-in-Hansen test, as mentioned before, tells you something about the mean stationarity condition needed for the validity of the level instruments. Taking these two test results at face value, the Hansen test for the full model would in principal be redundant but it is still reasonable to provide a complete picture.

https://www.kripfganz.de/stata/
1 like
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#5

04 Apr 2018, 12:46

Originally posted by Sebastian Kripfganz View Post

If the model without the additional instruments is correctly specified (i.e. the Hansen test excluding this group of instruments does not reject the null hypothesis), then the difference-in-Hansen test could be interpretated as a test for the validity of the additional instruments. In that regard, your understanding is correct.

As to which test results to report, it really depends. You certainly want to report the Hansen test for the full model. On top of that, it makes sense to report difference-in-Hansen tests for particular instruments if their inclusion requires particular justification. For example, if the Arellano-Bond AR(2) test does not reject the null hypothesis of no second-order serial correlation of the first-differenced errors, then you usually need not separately justify the lagged levels of the dependent variable as instruments for the first-differenced model. In contrast, the difference-in-Hansen test for the level instruments is informative because it helps to evaluate whether the Blundell-Bond mean stationarity assumption might be violated.

For example, you could report the Hansen test for the model with the instruments for the first-differenced model only, the Hansen test for the full model, and the respective difference-in-Hansen test. The Hansen test for the first-differenced model tells you something whether your model is dynamically complete (because this implies whether those instruments are valid). The difference-in-Hansen test, as mentioned before, tells you something about the mean stationarity condition needed for the validity of the level instruments. Taking these two test results at face value, the Hansen test for the full model would in principal be redundant but it is still reasonable to provide a complete picture.

Many thanks! I am not quite sure about what you refer to as "Hansen test for the first-differenced model". Do you mean "GMM instruments for levels" (the first subheading under Difference-in-Hansen tests, the underlined part in the following example)?

Btw, if the Hansen test excluding group is missing due to just identification, can I still interpret the corresponding Difference test? And is it a problem that Hansen test excluding group is missing?

Code:

Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(4) = 4.06 Prob > chi2 = 0.397 Difference (null H = exogenous): chi2(2) = 1.41 Prob > chi2 = 0.494 gmm(y, collapse lag(2 4)) Hansen test excluding group: chi2(2) = 4.33 Prob > chi2 = 0.115 Difference (null H = exogenous): chi2(4) = 1.14 Prob > chi2 = 0.887 gmm(x1, collapse lag(2 5)) Hansen test excluding group: chi2(1) = 0.02 Prob > chi2 = 0.884 Difference (null H = exogenous): chi2(5) = 5.45 Prob > chi2 = 0.363 iv(x2 x3, eq(level)) Hansen test excluding group: chi2(4) = 3.62 Prob > chi2 = 0.459 Difference (null H = exogenous): chi2(2) = 1.85 Prob > chi2 = 0.397

Thank you again!

Last edited by Alex Mai; 04 Apr 2018, 13:00.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#6

05 Apr 2018, 09:55

The "Hansen test for the first-differenced model" should be the very first test of your output ("Hansen text excluding group" for the group of "GMM instruments for levels").

If the "Hansen text excluding group" is missing due to just identification, you can still interpret the corresponding "Difference" test based on the assumption that the model without these additional instruments is correctly specified. This assumption is untestable due to the just identification. In that regard, it is not a problem as long as this assumption is justifyable (for example with the help of the AR(2) test).

https://www.kripfganz.de/stata/
2 likes
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#7

05 Apr 2018, 12:37

Originally posted by Sebastian Kripfganz View Post

The "Hansen test for the first-differenced model" should be the very first test of your output ("Hansen text excluding group" for the group of "GMM instruments for levels").

If the "Hansen text excluding group" is missing due to just identification, you can still interpret the corresponding "Difference" test based on the assumption that the model without these additional instruments is correctly specified. This assumption is untestable due to the just identification. In that regard, it is not a problem as long as this assumption is justifyable (for example with the help of the AR(2) test).

Thank you! Btw, is it correct to understand the Hansen test in this way that it can only test whether the instruments are exogenous to the idiosyncratic error terms, but not whether the instruments are exogenous to the individual-specific effects (fixed effects) in the composite error term?

I have read from textbook that the correlation between instruments and fixed effects cannot be statistically examined in a very appropriate manner.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#8

05 Apr 2018, 12:45

The instruments are valid if they are uncorrelated with the composite error term that is the sum of the fixed effects and the idiosyncratic error term. The Hansen test cannot distinguish between these two components.

The textbooks might refer to the fact that the fixed effects themselves cannot be estimated in a reliable way and hence it is not possible to examine the correlation between the instruments and the fixed effects.

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#9

05 Apr 2018, 13:14

Originally posted by Sebastian Kripfganz View Post

The instruments are valid if they are uncorrelated with the composite error term that is the sum of the fixed effects and the idiosyncratic error term. The Hansen test cannot distinguish between these two components.

The textbooks might refer to the fact that the fixed effects themselves cannot be estimated in a reliable way and hence it is not possible to examine the correlation between the instruments and the fixed effects.

Thanks a lot! I get your point. May I ask one more thing about the lag of instrument?

In a previous post, you mentioned

For the instruments, you would usually start with the second lag of the dependent variable and the first lag of the independent variables (or contemporaneous terms, depending on whether the variables are predetermined or strictly exogenous) instead of lag 6.

However, Roodman (2007) points that the standard treatment for predetermined variable is gmm(x, lag(1 .)), say from lag one, while the treatment for endogenous variable is gmm(x, lag(2 .)), say from lag two. I am a bit confused, as your suggestion seems to be different from that of Roodman (sorry, but perhaps due to my misunderstanding).

So just for clarification:
for dependent variable, I should write gmm(y, lag(2. ))
for predetermined regressor, I should write gmm(x, lag(1 .))
for endogenous regressor, I should write gmm(x, lag(2 .))
Is this correct?

And if I only include lagged regressor, rather than its current value, into the equation, such as L.x, then it is a predetermined regressor and I should treat it as gmm(L.x, lag(1 .)), right?

In an economic paper, the author uses lagged regressor in System GMM as predetermined regressor, and he argues that System GMM can deal with the two-way causality between the predetermined variable and the dependent variable. But I do not think his argument is correct, since the causality cannot go from the dependent variable (the current value at t) to the lagged regressor (the lagged value at t-1).

Do you think that my interpretation is correct?

Many thanks again!

Last edited by Alex Mai; 05 Apr 2018, 13:17.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#10

05 Apr 2018, 14:41

Your choice of lags is correct. The lagged dependent variable L.y is essentially a predetermined variable, hence gmm(L.y, lag(1 .)) which is equivalent to gmm(y, lag(2 .)).

I am not sure what the author means by two-way causality between the predetermined variable and the dependent variable.

https://www.kripfganz.de/stata/
1 like
Comment

Alex Mai

Join Date: May 2016
Posts: 213

#11

10 Apr 2018, 12:09

Originally posted by Sebastian Kripfganz View Post

Your choice of lags is correct. The lagged dependent variable L.y is essentially a predetermined variable, hence gmm(L.y, lag(1 .)) which is equivalent to gmm(y, lag(2 .)).

I am not sure what the author means by two-way causality between the predetermined variable and the dependent variable.

Dear Sebastian,

May I ask one question about missing Difference-in-Hansen test? I just now tried adding one more variable to my equation, and then Stata did not report the Difference-in-Hansen test at all (the exported outcome of Stata stopped at Sargan test and Hansen test). I do not know what goes wrong here.
The result is shown as follows:

Code:

. xtabond2 y L.y year2-year18 x7 x4 x6 x2 x1 d1 d2 cs, gmm(y
> , lag(2 3) collapse) gmm(x6, lag(2 3) collapse) iv(cs x1, eq(level)) iv(d1
>  d2 x4 x7 x2 year2-year18, eq(level)) twostep robust
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed,
>  perm.
Warning: Two-step estimated covariance matrix of moments is singular.
  Using a generalized inverse to calculate optimal weighting matrix for two-step estim
> ation.
  Difference-in-Sargan/Hansen statistics may be negative.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: i                               Number of obs      =       917
Time variable : year                            Number of groups   =        60
Number of instruments = 28                      Obs per group: min =         3
Wald chi2(26) =   1140.90                                      avg =     15.28
Prob > chi2   =     0.000                                      max =        16
------------------------------------------------------------------------------
             |              Corrected
       y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       y |
         L1. |   .3862145   .0622517     6.20   0.000     .2642034    .5082256
             |
       year2 |  -.0853252   .0277727    -3.07   0.002    -.1397587   -.0308917
       year3 |          0  (omitted)
       year4 |  -.0858434   .0261483    -3.28   0.001    -.1370931   -.0345938
       year5 |          0  (omitted)
       year6 |  -.0398763   .0233303    -1.71   0.087    -.0856028    .0058501
       year7 |          0  (omitted)
       year8 |  -.0362826   .0258199    -1.41   0.160    -.0868888    .0143236
       year9 |  -.0343727   .0203296    -1.69   0.091    -.0742179    .0054726
      year10 |  -.0154841   .0276664    -0.56   0.576    -.0697092     .038741
      year11 |  -.0058569   .0166615    -0.35   0.725    -.0385129    .0267991
      year12 |    .013753   .0185975     0.74   0.460    -.0226974    .0502035
      year13 |   .0117277   .0217985     0.54   0.591    -.0309967     .054452
      year14 |  -.0244108   .0196056    -1.25   0.213     -.062837    .0140154
      year15 |  -.0301793   .0204674    -1.47   0.140    -.0702947    .0099361
      year16 |   .0022201   .0191573     0.12   0.908    -.0353276    .0397678
      year17 |    .008338   .0191025     0.44   0.662    -.0291023    .0457782
      year18 |  -.0019966    .014509    -0.14   0.891    -.0304337    .0264406
          x7 |   .0034003   .0084641     0.40   0.688    -.0131891    .0199897
          x4 |   .0076999   .0084361     0.91   0.361    -.0088346    .0242344
          x6 |  -.0037636   .0042373    -0.89   0.374    -.0120686    .0045415
          x2 |  -.0922484   .0311284    -2.96   0.003     -.153259   -.0312378
          x1 |  -.0093963   .0067857    -1.38   0.166     -.022696    .0039035
          d1 |  -.1630917    .035516    -4.59   0.000    -.2327018   -.0934816
          d2 |  -.0140644   .0300584    -0.47   0.640    -.0729777    .0448489
          cs |   .0933815   .0242519     3.85   0.000     .0458486    .1409143
       _cons |   5.716706   .5699476    10.03   0.000     4.599629    6.833782
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/3).x6 collapsed
    L(2/3).y collapsed
Instruments for levels equation
  Standard
    d1 d2 x4 x7 x2 year2 year3 year4 year5 year6 year7 year8 year9
    year10 year11 year12 year13 year14 year15 year16 year17 year18
    cs x1
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.x6 collapsed
    DL.y collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -3.66  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -0.23  Pr > z =  0.817
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(1)    =   1.50  Prob > chi2 =  0.220
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(1)    =   2.35  Prob > chi2 =  0.125
  (Robust, but weakened by many instruments.)

. 
end of do-file

The above is what Stata reported. Stata did not give anything about Difference-in-Hansen test.

The newly added variable seems to be special, since it is lacking for all observations in year 1, year 3, year 5, and year 7. Without this variable, everything worked well.

This is the statistical summary of the variable.

Code:

     Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         cs |        912   -.7202961    .4150485      -1.72        .81

Could you please help to check what the problem is?

Many thanks!

Last edited by Alex Mai; 10 Apr 2018, 12:24.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#12

11 Apr 2018, 04:52

If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#13

11 Apr 2018, 06:50

Originally posted by Sebastian Kripfganz View Post

If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.

Thank you! I have tried starting from period 8 and it works. But sometimes one of the year dummies is dropped by Stata (xtabond2). It is not shown in the table with a notification of "dropped due to collinearity", instead of being reported as "omitted" in the table. But everything else seems to be fine. Do you think if this may cause problem?

You mentioned that omitted dummies cause problem, but in my case the dummy is dropped instead of omitted (e.g. year16 in the following example).

I set time dummy as year9-year18, instead of i.year.

Code:

year16 dropped due to collinearity year9 | -.0052901 .0244344 -0.22 0.829 -.0531807 .0426006 year10 | .0138947 .0240152 0.58 0.563 -.0331742 .0609637 year11 | -.0005045 .0229553 -0.02 0.982 -.0454961 .044487 year12 | -.0087308 .0238082 -0.37 0.714 -.055394 .0379323 year13 | -.0076787 .017286 -0.44 0.657 -.0415586 .0262013 year14 | -.0275574 .0274015 -1.01 0.315 -.0812634 .0261487 year15 | -.0237062 .0197515 -1.20 0.230 -.0624185 .0150061 year17 | -.0024705 .0136791 -0.18 0.857 -.029281 .02434 year18 | -.0132392 .0180177 -0.73 0.462 -.0485533 .0220748

Last edited by Alex Mai; 11 Apr 2018, 07:43.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#14

11 Apr 2018, 07:42

It is not entirely clear to me when xtabond2 "drops" a dummy and when it just "omits" it. The former is less of a problem although there might be a more subtle complication: The year11 dummy is still kept as an instrument (at least it is shown as an instrument below the regression table) despite being dropped as a regressor. That is not what you want. To be on the safe side, I would always recommend to amend the specification until nothing is dropped or omitted any more. In your case, I would just use the dummies from year3 to year13.

https://www.kripfganz.de/stata/
1 like
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#15

11 Apr 2018, 07:53

Originally posted by Sebastian Kripfganz View Post

It is not entirely clear to me when xtabond2 "drops" a dummy and when it just "omits" it. The former is less of a problem although there might be a more subtle complication: The year11 dummy is still kept as an instrument (at least it is shown as an instrument below the regression table) despite being dropped as a regressor. That is not what you want. To be on the safe side, I would always recommend to amend the specification until nothing is dropped or omitted any more. In your case, I would just use the dummies from year3 to year13.

Thanks a lot! After I use year3-year13, rather than year2-year13, no time dummy is dropped any more. The results of other variables and Hansen test are exactly the same as using year2-year12.

But what is the rationale behind this approach? Normally, people start from the second time dummy, in order to avoid the dummy trap.
1 like
Comment

Announcement