Which test to see in Difference-in Hansen test, excluding or difference.

Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#16

11 Apr 2018, 09:32

You lose one more period because of the lagged dependent variable. Your first period in the estimation sample is the second period in your data set.

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#17

11 Apr 2018, 10:49

Originally posted by Sebastian Kripfganz View Post

You lose one more period because of the lagged dependent variable. Your first period in the estimation sample is the second period in your data set.

So if the first lag of the dependent variable is used as regressor, then I should not set the time dummy as year2-year10, for example, but as year3-year10, since the first period (L.y) is the second period in dataset. Is my interpretation right?

But in an answer to another post, you gave the following code with yr2-yr10, rather than yr3-yr10

What about adding further lags of the dependent variable, e.g, Code:

Code:

xtabond2 ltfp L.ltfp L2.ltfp routsales rndva yr2-yr10, iv(yr2-yr10, eq(level)) gmm(routsales rndva, lag(2 3)) gmm(ltfp, lag(2 3)) twostep r obust artests(3)

https://www.statalist.org/forums/for...nd-deeper-lags Perhaps I misunderstood your point.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#18

11 Apr 2018, 11:34

Your interpretation is correct.

Presumably, in the other Statalist topic the first year in the data set was year 0 instead of 1, although this is is actually not clear given that the other enquirer did not show his estimation output. It is possible that in his case one time dummy got omitted as well.

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#19

11 Apr 2018, 14:11

Originally posted by Sebastian Kripfganz View Post

Your interpretation is correct.

Presumably, in the other Statalist topic the first year in the data set was year 0 instead of 1, although this is is actually not clear given that the other enquirer did not show his estimation output. It is possible that in his case one time dummy got omitted as well.

I see! Btw, may I ask for your suggestions about the choice of the depth of lag as instruments? I think this is a subjective and contingent issue, but I am really confused about this.

For instance, x is significant under gmm(x, lag(2 3)) but turns to be insignificant under gmm(x, lag(2 4)) or gmm(x, lag(2 5)). Do you think it makes much sense to say something about the effect of x?

I think if the change in statistical significance is due to the the weak correlation of deeper lags with the current instrumented variable, then perhaps it is safe to say that x has effect on y. Just as what you argued in another post, too deeper lags may be only weakly correlated with the instrumented variable unless the series is persistent.

But if such change in statistical significance is due to other reasons, perhaps we have to say that the effect of x on y is not robust, depending on the use of lag.

In some other cases, instrumented variables may turn to be significant after deeper lags are used.

So is it right to say that experience/intuition or economic theory (if any) are the primary way to decide the depth of lags?

Many thanks again!

Last edited by Alex Mai; 11 Apr 2018, 14:40.
1 like
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#20

12 Apr 2018, 04:08

This lack of robustness is a general problem of this kind of GMM estimation when the cross-sectional dimension is not very large and it indeed reduces the reliability of the estimates. Your intuition goes in the right direction. Your observation that the effect turns statistically insignificant could indeed be a consequence of deeper lags becoming weak. You could use this argumentation to justify a specification with just the second and third lag.

If coefficients turn statistically significant by adding deeper lags, this would worry me more. These results might be "spurious" as a consequence of having too many instruments.

I would indeed recommend not include too deep lags if you have good (economic) arguments that these additional lags would only be weekly correlated with the instrumented variables.

An alternative might be to use the pca option of xtabond2 but that is something for which I cannot provide any help.

https://www.kripfganz.de/stata/
2 likes
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#21

12 Apr 2018, 07:29

Originally posted by Sebastian Kripfganz View Post

This lack of robustness is a general problem of this kind of GMM estimation when the cross-sectional dimension is not very large and it indeed reduces the reliability of the estimates. Your intuition goes in the right direction. Your observation that the effect turns statistically insignificant could indeed be a consequence of deeper lags becoming weak. You could use this argumentation to justify a specification with just the second and third lag.

If coefficients turn statistically significant by adding deeper lags, this would worry me more. These results might be "spurious" as a consequence of having too many instruments.

I would indeed recommend not include too deep lags if you have good (economic) arguments that these additional lags would only be weekly correlated with the instrumented variables.

An alternative might be to use the pca option of xtabond2 but that is something for which I cannot provide any help.

Thanks a lot! I remember that you have suggested an enquirer to use the second lag of the dependent variable (L2.y) as a regressor in addition to L.y. I have tried this, but L2.y is highly insignificant. Is this an evidence that L2.y is not useful?

Normally, insignificant variables may be dropped as a way to keep the model parsimonious. But I am not sure if this still holds for lagged DepVar in System GMM.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#22

12 Apr 2018, 07:36

Adding further lags of the dependent variable as regressors might be useful to avoid a serial correlation of the idiosyncratic errors if the Arellano-Bond AR(2) test provides evidence in that regard.

Adding a further lag might help to increase the p-value of the AR(2) test even if this additional regressor turns out to be not statistically significant. In that case, it might be worth keeping it nevertheless. Of course, if there is no concern about serial correlation and further lags of the dependent variable are (highly) statistically insignificant, then there is no reason to keep them in the model and a more parsimonious model would be preferred.

https://www.kripfganz.de/stata/
1 like
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#23

12 Apr 2018, 08:04

Originally posted by Sebastian Kripfganz View Post

Adding further lags of the dependent variable as regressors might be useful to avoid a serial correlation of the idiosyncratic errors if the Arellano-Bond AR(2) test provides evidence in that regard.

Adding a further lag might help to increase the p-value of the AR(2) test even if this additional regressor turns out to be not statistically significant. In that case, it might be worth keeping it nevertheless. Of course, if there is no concern about serial correlation and further lags of the dependent variable are (highly) statistically insignificant, then there is no reason to keep them in the model and a more parsimonious model would be preferred.

Many thanks again!
Comment

Alex Mai

Join Date: May 2016
Posts: 213

#24

16 Apr 2018, 10:49

Originally posted by Sebastian Kripfganz View Post

If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.

Dear Sebastian,

May I ask for your suggestions about a weird situation of missed Difference-in-Hansen test? For a panel database, if I use -collapse-, then the Difference-In-Hansen test only shows Hansen test for the first-differenced model, without the test for each subset of instruments. This is an almost balanced panel, and no variable is dropped or omitted (so the bug of omitted variable should not matter here).

But if I do not use-collapse-, the full Difference-In-Hansen test is reported (but then the number of instrument is larger than that of group).

If possible, could you please check if anything is wrong? Many thanks!

The following is my command and the stata output.

Code:

. xtabond2 y L.y x1 x2 x3 x4 year3-year20, gmm(y, lag(2 3) collapse) iv(x1 x2
> x3, eq(level)) iv(x4 year3-year20, eq(level)) robust twostep
Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.

Dynamic panel-data estimation, two-step system GMM
------------------------------------------------------------------------------
Group variable: i                               Number of obs      =      1102
Time variable : year                            Number of groups   =        58
Number of instruments = 26                      Obs per group: min =        19
Wald chi2(23) =    399.43                                      avg =     19.00
Prob > chi2   =     0.000                                      max =        19
------------------------------------------------------------------------------
             |              Corrected
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           y |
         L1. |    .327704   .0541904     6.05   0.000     .2214928    .4339153
             |
          x1 |   -.003743   .0027996    -1.34   0.181    -.0092301     .001744
          x2 |   .0068359   .0190328     0.36   0.719    -.0304677    .0441395
          x3 |  -.3073064    .068547    -4.48   0.000     -.441656   -.1729568
          x4 |  -.2079799   .1522344    -1.37   0.172    -.5063538    .0903941
       year3 |    .063862   .0639817     1.00   0.318    -.0615397    .1892638
       year4 |   .0546963   .0515861     1.06   0.289    -.0464106    .1558031
       year5 |   .0420467   .0435978     0.96   0.335    -.0434035    .1274968
       year6 |   .0619219   .0580356     1.07   0.286    -.0518258    .1756696
       year7 |    .056716   .0536444     1.06   0.290    -.0484252    .1618572
       year8 |   .1005629   .0544172     1.85   0.065    -.0060928    .2072187
       year9 |  -.0018599   .0577488    -0.03   0.974    -.1150453    .1113256
      year10 |  -.0385923   .0574199    -0.67   0.502    -.1511333    .0739486
      year11 |   .0336183   .0536857     0.63   0.531    -.0716036    .1388403
      year12 |   .0320164   .0530338     0.60   0.546    -.0719279    .1359607
      year13 |   .0593187   .0531552     1.12   0.264    -.0448636    .1635011
      year14 |   .0551061   .0529566     1.04   0.298    -.0486869    .1588992
      year15 |   .0790878   .0499249     1.58   0.113    -.0187632    .1769388
      year16 |   .0573177   .0547685     1.05   0.295    -.0500266     .164662
      year17 |  -.0037387   .0531859    -0.07   0.944    -.1079811    .1005036
      year18 |   .0469988   .0537304     0.87   0.382    -.0583109    .1523085
      year19 |   .0789955   .0551545     1.43   0.152    -.0291052    .1870963
      year20 |   .0067543    .052748     0.13   0.898    -.0966298    .1101385
       _cons |   4.600625   1.428604     3.22   0.001     1.800613    7.400637
------------------------------------------------------------------------------
Instruments for first differences equation
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    L(2/3).y collapsed
Instruments for levels equation
  Standard
    x4 year3 year4 year5 year6 year7 year8 year9 year10 year11 year12
    year13 year14 year15 year16 year17 year18 year19 year20
    x1 x2 x3
    _cons
  GMM-type (missing=0, separate instruments for each period unless collapsed)
    DL.y collapsed
------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -3.04  Pr > z =  0.002
Arellano-Bond test for AR(2) in first differences: z =   1.16  Pr > z =  0.246
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(2)    =   3.44  Prob > chi2 =  0.179
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(2)    =   1.47  Prob > chi2 =  0.481
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(1)    =   1.14  Prob > chi2 =  0.286
    Difference (null H = exogenous): chi2(1)    =   0.33  Prob > chi2 =  0.567

.
end of do-file

Last edited by Alex Mai; 16 Apr 2018, 11:00.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#25

16 Apr 2018, 11:49

The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#26

16 Apr 2018, 12:47

Originally posted by Sebastian Kripfganz View Post

The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.

Thanks a lot! I have tried splitting the set of instruments and it works well. However, Hansen test for the subset with time dummy, for example -iv(x4 year3-year18)-, will never be shown, since it is impossible for the model to be overidentified after removing the too many instruments for time dummies. I do not think this will affect Hansen tests of other sets of instruments and the full model (also the estimation). Do you think that I am right?

Thank you again!
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#27

16 Apr 2018, 16:48

A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#28

17 Apr 2018, 02:36

Originally posted by Sebastian Kripfganz View Post

A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.

Thanks a lot! Sometimes two exogenous (but perhaps not deterministic) variables cannot pass Difference-un-Hanse test if they are treated separately (i.e. -iv(x1, eq(level))-, -iv(x2, eq(level))-). But if the two variables are put together in one -iv()-, then Difference-in-Hansen test does not reject the null of exogeneity. May I ask the mechanism underlying this situation?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2595
#29

17 Apr 2018, 04:23

It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.

Do not combine non-deterministic with deterministic instruments in the same group.

On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.

It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.

...

https://www.kripfganz.de/stata/
Comment
Alex Mai

Join Date: May 2016

Posts: 213
#30

17 Apr 2018, 06:26

Originally posted by Sebastian Kripfganz View Post

It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.

Do not combine non-deterministic with deterministic instruments in the same group.

On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.

It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.

...

Thank you! But in what situation or for what kind of variables, should I use -iv(x, eq(diff))-? You have argued that -iv(x, eq(level))- makes -iv(x, eq(diff))- asymptotically redundant.

And can I understand the Hansen test for the first-differenced model (the very first part in the Difference-in-Hansen test) as the test for the validity of lagged dependent and endogenous variables as instruments for the first-differenced dependent and endogenous variables in the first-differenced equation?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment