Which test to see in Difference-in Hansen test, excluding or difference.

Alex Mai replied

17 Apr 2018, 13:00
Originally posted by Sebastian Kripfganz View Post

Not necessarily. xtabond2 computes the respective Difference-in-Hansen tests even when you do not separate them. But separating the options helps to understand how the GMM estimator is actually constructed. (It becomes less of a black box.)

Thanks a lot! Btw, do you think that xtabond2 is suitable for (dynamic) Linear Probability Model with binary dependent variable? As far as I know perhaps there is not stata command for dynamic logit model. I do see someone uses xtabond2 to estimate binary dependent in a dynamic situation.

One problem with Linear Probability Model is the possibility of negative fitted value, but studies have shown that except extreme situations (e.g. probability like 99% or 1%) the odds ratios are almost linear function of probability, which supports the use of Linear Probability Model. But I am not sure if this also holds for System GMM.

Last edited by Alex Mai; 17 Apr 2018, 13:10.
1 like
Leave a comment:
Sebastian Kripfganz replied

17 Apr 2018, 11:29
Not necessarily. xtabond2 computes the respective Difference-in-Hansen tests even when you do not separate them. But separating the options helps to understand how the GMM estimator is actually constructed. (It becomes less of a black box.)
Leave a comment:
Alex Mai replied

17 Apr 2018, 10:33
Originally posted by Sebastian Kripfganz View Post

My comment mainly referred to the gmm() option that creates lagged levels as instruments for the first-differenced model and differences as instruments for the level model. Here, the latter do not turn the former redundant.

On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.

So do you mean that for -gmm()- it is also recommended to split into -gmm( eq(level))- and -gmm( eq(diff))-? What I learn from earlier posts is that eq(level) and eq(diff) should be separately set for -iv()-, but I am not sure about -gmm()-.
Leave a comment:
Sebastian Kripfganz replied

17 Apr 2018, 07:05
Originally posted by Alex Mai View Post

But in what situation or for what kind of variables, should I use -iv(x, eq(diff))-? You have argued that -iv(x, eq(level))- makes -iv(x, eq(diff))- asymptotically redundant.

My comment mainly referred to the gmm() option that creates lagged levels as instruments for the first-differenced model and differences as instruments for the level model. Here, the latter do not turn the former redundant.

Originally posted by Alex Mai View Post

And can I understand the Hansen test for the first-differenced model (the very first part in the Difference-in-Hansen test) as the test for the validity of lagged dependent and endogenous variables as instruments for the first-differenced dependent and endogenous variables in the first-differenced equation?

In short, yes. But keep in mind that, strictly speaking, the overidentification tests are not just test for the validity of instruments. If you reject the null hypothesis, this might be because your instruments are indeed invalid given that the model is otherwise correctly specified, or it might be that your model suffers from another form of misspecification.
Leave a comment:
Alex Mai replied

17 Apr 2018, 06:26
Originally posted by Sebastian Kripfganz View Post

It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.

Do not combine non-deterministic with deterministic instruments in the same group.

On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.

It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.

...

Thank you! But in what situation or for what kind of variables, should I use -iv(x, eq(diff))-? You have argued that -iv(x, eq(level))- makes -iv(x, eq(diff))- asymptotically redundant.

And can I understand the Hansen test for the first-differenced model (the very first part in the Difference-in-Hansen test) as the test for the validity of lagged dependent and endogenous variables as instruments for the first-differenced dependent and endogenous variables in the first-differenced equation?
Leave a comment:
Sebastian Kripfganz replied

17 Apr 2018, 04:23
It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.

Do not combine non-deterministic with deterministic instruments in the same group.

On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.

It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.

...
Leave a comment:
Alex Mai replied

17 Apr 2018, 02:36
Originally posted by Sebastian Kripfganz View Post

A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.

Thanks a lot! Sometimes two exogenous (but perhaps not deterministic) variables cannot pass Difference-un-Hanse test if they are treated separately (i.e. -iv(x1, eq(level))-, -iv(x2, eq(level))-). But if the two variables are put together in one -iv()-, then Difference-in-Hansen test does not reject the null of exogeneity. May I ask the mechanism underlying this situation?
Leave a comment:
Sebastian Kripfganz replied

16 Apr 2018, 16:48
A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.
Leave a comment:
Alex Mai replied

16 Apr 2018, 12:47
Originally posted by Sebastian Kripfganz View Post

The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.

Thanks a lot! I have tried splitting the set of instruments and it works well. However, Hansen test for the subset with time dummy, for example -iv(x4 year3-year18)-, will never be shown, since it is impossible for the model to be overidentified after removing the too many instruments for time dummies. I do not think this will affect Hansen tests of other sets of instruments and the full model (also the estimation). Do you think that I am right?

Thank you again!
Leave a comment:
Sebastian Kripfganz replied

16 Apr 2018, 11:49
The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.
Leave a comment:

Alex Mai replied

16 Apr 2018, 10:49

Originally posted by Sebastian Kripfganz View Post

If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.

Dear Sebastian,

May I ask for your suggestions about a weird situation of missed Difference-in-Hansen test? For a panel database, if I use -collapse-, then the Difference-In-Hansen test only shows Hansen test for the first-differenced model, without the test for each subset of instruments. This is an almost balanced panel, and no variable is dropped or omitted (so the bug of omitted variable should not matter here).

But if I do not use-collapse-, the full Difference-In-Hansen test is reported (but then the number of instrument is larger than that of group).

If possible, could you please check if anything is wrong? Many thanks!

The following is my command and the stata output.

Code:

. xtabond2 y L.y x1 x2 x3 x4 year3-year20, gmm(y, lag(2 3) collapse) iv(x1 x2
> x3, eq(level)) iv(x4 year3-year20, eq(level)) robust twostep
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: i                               Number of obs      =      1102
Time variable : year                            Number of groups   =        58
Number of instruments = 26                      Obs per group: min =        19
Wald chi2(23) =    399.43                                      avg =     19.00
Prob > chi2   =     0.000                                      max =        19
------------------------------------------------------------------------------
             |              Corrected
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           y |
         L1. |    .327704   .0541904     6.05   0.000     .2214928    .4339153
             |
          x1 |   -.003743   .0027996    -1.34   0.181    -.0092301     .001744
          x2 |   .0068359   .0190328     0.36   0.719    -.0304677    .0441395
          x3 |  -.3073064    .068547    -4.48   0.000     -.441656   -.1729568
          x4 |  -.2079799   .1522344    -1.37   0.172    -.5063538    .0903941
       year3 |    .063862   .0639817     1.00   0.318    -.0615397    .1892638
       year4 |   .0546963   .0515861     1.06   0.289    -.0464106    .1558031
       year5 |   .0420467   .0435978     0.96   0.335    -.0434035    .1274968
       year6 |   .0619219   .0580356     1.07   0.286    -.0518258    .1756696
       year7 |    .056716   .0536444     1.06   0.290    -.0484252    .1618572
       year8 |   .1005629   .0544172     1.85   0.065    -.0060928    .2072187
       year9 |  -.0018599   .0577488    -0.03   0.974    -.1150453    .1113256
      year10 |  -.0385923   .0574199    -0.67   0.502    -.1511333    .0739486
      year11 |   .0336183   .0536857     0.63   0.531    -.0716036    .1388403
      year12 |   .0320164   .0530338     0.60   0.546    -.0719279    .1359607
      year13 |   .0593187   .0531552     1.12   0.264    -.0448636    .1635011
      year14 |   .0551061   .0529566     1.04   0.298    -.0486869    .1588992
      year15 |   .0790878   .0499249     1.58   0.113    -.0187632    .1769388
      year16 |   .0573177   .0547685     1.05   0.295    -.0500266     .164662
      year17 |  -.0037387   .0531859    -0.07   0.944    -.1079811    .1005036
      year18 |   .0469988   .0537304     0.87   0.382    -.0583109    .1523085
      year19 |   .0789955   .0551545     1.43   0.152    -.0291052    .1870963
      year20 |   .0067543    .052748     0.13   0.898    -.0966298    .1101385
       _cons |   4.600625   1.428604     3.22   0.001     1.800613    7.400637
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/3).y collapsed
Instruments for levels equation
  Standard
    x4 year3 year4 year5 year6 year7 year8 year9 year10 year11 year12
    year13 year14 year15 year16 year17 year18 year19 year20
    x1 x2 x3
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.y collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -3.04  Pr > z =  0.002
Arellano-Bond test for AR(2) in first differences: z =   1.16  Pr > z =  0.246
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(2)    =   3.44  Prob > chi2 =  0.179
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(2)    =   1.47  Prob > chi2 =  0.481
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(1)    =   1.14  Prob > chi2 =  0.286
    Difference (null H = exogenous): chi2(1)    =   0.33  Prob > chi2 =  0.567

.
end of do-file

Last edited by Alex Mai; 16 Apr 2018, 11:00.

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: