Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which test to see in Difference-in Hansen test, excluding or difference.

    Dear Statalists,

    I read from Roodman (2007) that one should report the difference-in-hansen test for the validity and exogeneity of subset of instruments (despite that many published studies do not report them). However, I am not sure which of the two sub tests under difference-in-hansen (Hansen Excluding and Difference) I should report. some papers report both of them, while some only report one.

    "Hansen Excluding Group" examines the validity of the model without the specified set of instruments (the set of instruments specified in each sub-heading, such as iv(x2 x3)), and the "Difference" test examines the validity of the specified set of instruments by computing the difference between the two Hansen J statistics with and without this set of instruments. Is this understanding correct?

    Fo instance,
    Code:
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(4)    =   4.06  Prob > chi2 =  0.397
        Difference (null H = exogenous): chi2(2)    =   1.41  Prob > chi2 =  0.494
      gmm(y, collapse lag(2 4))
        Hansen test excluding group:     chi2(2)    =   4.33  Prob > chi2 =  0.115
        Difference (null H = exogenous): chi2(4)    =   1.14  Prob > chi2 =  0.887
      gmm(x1, collapse lag(2 5))
        Hansen test excluding group:     chi2(1)    =   0.02  Prob > chi2 =  0.884
        Difference (null H = exogenous): chi2(5)    =   5.45  Prob > chi2 =  0.363
      iv(x2 x3, eq(level))
        Hansen test excluding group:     chi2(4)    =   3.62  Prob > chi2 =  0.459
        Difference (null H = exogenous): chi2(2)    =   1.85  Prob > chi2 =  0.397

    Should I report both of the two sub tests or only the Difference test? and is it necessary to report all the four sets of difference-in-hansen tests (GMM instruments for levels, gmm (y), gmm(x1), and iv(x2 x3))?

    Thank you!
    Last edited by Alex Mai; 02 Apr 2018, 03:59.

  • #2
    Any suggestions would be really appreciated! Many thanks.

    Comment


    • #3
      Originally posted by Alex Mai View Post
      Any suggestions would be really appreciated! Many thanks.
      Hope any suggestions! Thank you!

      Comment


      • #4
        If the model without the additional instruments is correctly specified (i.e. the Hansen test excluding this group of instruments does not reject the null hypothesis), then the difference-in-Hansen test could be interpretated as a test for the validity of the additional instruments. In that regard, your understanding is correct.

        As to which test results to report, it really depends. You certainly want to report the Hansen test for the full model. On top of that, it makes sense to report difference-in-Hansen tests for particular instruments if their inclusion requires particular justification. For example, if the Arellano-Bond AR(2) test does not reject the null hypothesis of no second-order serial correlation of the first-differenced errors, then you usually need not separately justify the lagged levels of the dependent variable as instruments for the first-differenced model. In contrast, the difference-in-Hansen test for the level instruments is informative because it helps to evaluate whether the Blundell-Bond mean stationarity assumption might be violated.

        For example, you could report the Hansen test for the model with the instruments for the first-differenced model only, the Hansen test for the full model, and the respective difference-in-Hansen test. The Hansen test for the first-differenced model tells you something whether your model is dynamically complete (because this implies whether those instruments are valid). The difference-in-Hansen test, as mentioned before, tells you something about the mean stationarity condition needed for the validity of the level instruments. Taking these two test results at face value, the Hansen test for the full model would in principal be redundant but it is still reasonable to provide a complete picture.
        https://twitter.com/Kripfganz

        Comment


        • #5
          Originally posted by Sebastian Kripfganz View Post
          If the model without the additional instruments is correctly specified (i.e. the Hansen test excluding this group of instruments does not reject the null hypothesis), then the difference-in-Hansen test could be interpretated as a test for the validity of the additional instruments. In that regard, your understanding is correct.

          As to which test results to report, it really depends. You certainly want to report the Hansen test for the full model. On top of that, it makes sense to report difference-in-Hansen tests for particular instruments if their inclusion requires particular justification. For example, if the Arellano-Bond AR(2) test does not reject the null hypothesis of no second-order serial correlation of the first-differenced errors, then you usually need not separately justify the lagged levels of the dependent variable as instruments for the first-differenced model. In contrast, the difference-in-Hansen test for the level instruments is informative because it helps to evaluate whether the Blundell-Bond mean stationarity assumption might be violated.

          For example, you could report the Hansen test for the model with the instruments for the first-differenced model only, the Hansen test for the full model, and the respective difference-in-Hansen test. The Hansen test for the first-differenced model tells you something whether your model is dynamically complete (because this implies whether those instruments are valid). The difference-in-Hansen test, as mentioned before, tells you something about the mean stationarity condition needed for the validity of the level instruments. Taking these two test results at face value, the Hansen test for the full model would in principal be redundant but it is still reasonable to provide a complete picture.
          Many thanks! I am not quite sure about what you refer to as "Hansen test for the first-differenced model". Do you mean "GMM instruments for levels" (the first subheading under Difference-in-Hansen tests, the underlined part in the following example)?

          Btw, if the Hansen test excluding group is missing due to just identification, can I still interpret the corresponding Difference test? And is it a problem that Hansen test excluding group is missing?

          Code:
          Difference-in-Hansen tests of exogeneity of instrument subsets:  
            GMM instruments for levels 
               Hansen test excluding group:     chi2(4)    =   4.06  Prob > chi2 =  0.397 
               Difference (null H = exogenous): chi2(2)    =   1.41  Prob > chi2 =  0.494  
            gmm(y, collapse lag(2 4))    
               Hansen test excluding group:     chi2(2)    =   4.33  Prob > chi2 =  0.115    
               Difference (null H = exogenous): chi2(4)    =   1.14  Prob > chi2 =  0.887  
            gmm(x1, collapse lag(2 5))    
               Hansen test excluding group:     chi2(1)    =   0.02  Prob > chi2 =  0.884    
               Difference (null H = exogenous): chi2(5)    =   5.45  Prob > chi2 =  0.363
            iv(x2 x3, eq(level))    
               Hansen test excluding group:     chi2(4)    =   3.62  Prob > chi2 =  0.459    
               Difference (null H = exogenous): chi2(2)    =   1.85  Prob > chi2 =  0.397
          Thank you again!
          Last edited by Alex Mai; 04 Apr 2018, 13:00.

          Comment


          • #6
            The "Hansen test for the first-differenced model" should be the very first test of your output ("Hansen text excluding group" for the group of "GMM instruments for levels").

            If the "Hansen text excluding group" is missing due to just identification, you can still interpret the corresponding "Difference" test based on the assumption that the model without these additional instruments is correctly specified. This assumption is untestable due to the just identification. In that regard, it is not a problem as long as this assumption is justifyable (for example with the help of the AR(2) test).
            https://twitter.com/Kripfganz

            Comment


            • #7
              Originally posted by Sebastian Kripfganz View Post
              The "Hansen test for the first-differenced model" should be the very first test of your output ("Hansen text excluding group" for the group of "GMM instruments for levels").

              If the "Hansen text excluding group" is missing due to just identification, you can still interpret the corresponding "Difference" test based on the assumption that the model without these additional instruments is correctly specified. This assumption is untestable due to the just identification. In that regard, it is not a problem as long as this assumption is justifyable (for example with the help of the AR(2) test).
              Thank you! Btw, is it correct to understand the Hansen test in this way that it can only test whether the instruments are exogenous to the idiosyncratic error terms, but not whether the instruments are exogenous to the individual-specific effects (fixed effects) in the composite error term?

              I have read from textbook that the correlation between instruments and fixed effects cannot be statistically examined in a very appropriate manner.

              Comment


              • #8
                The instruments are valid if they are uncorrelated with the composite error term that is the sum of the fixed effects and the idiosyncratic error term. The Hansen test cannot distinguish between these two components.

                The textbooks might refer to the fact that the fixed effects themselves cannot be estimated in a reliable way and hence it is not possible to examine the correlation between the instruments and the fixed effects.
                https://twitter.com/Kripfganz

                Comment


                • #9
                  Originally posted by Sebastian Kripfganz View Post
                  The instruments are valid if they are uncorrelated with the composite error term that is the sum of the fixed effects and the idiosyncratic error term. The Hansen test cannot distinguish between these two components.

                  The textbooks might refer to the fact that the fixed effects themselves cannot be estimated in a reliable way and hence it is not possible to examine the correlation between the instruments and the fixed effects.
                  Thanks a lot! I get your point. May I ask one more thing about the lag of instrument?

                  In a previous post, you mentioned
                  For the instruments, you would usually start with the second lag of the dependent variable and the first lag of the independent variables (or contemporaneous terms, depending on whether the variables are predetermined or strictly exogenous) instead of lag 6.
                  However, Roodman (2007) points that the standard treatment for predetermined variable is gmm(x, lag(1 .)), say from lag one, while the treatment for endogenous variable is gmm(x, lag(2 .)), say from lag two. I am a bit confused, as your suggestion seems to be different from that of Roodman (sorry, but perhaps due to my misunderstanding).

                  So just for clarification:
                  for dependent variable, I should write gmm(y, lag(2. ))
                  for predetermined regressor, I should write gmm(x, lag(1 .))
                  for endogenous regressor, I should write gmm(x, lag(2 .))
                  Is this correct?

                  And if I only include lagged regressor, rather than its current value, into the equation, such as L.x, then it is a predetermined regressor and I should treat it as gmm(L.x, lag(1 .)), right?

                  In an economic paper, the author uses lagged regressor in System GMM as predetermined regressor, and he argues that System GMM can deal with the two-way causality between the predetermined variable and the dependent variable. But I do not think his argument is correct, since the causality cannot go from the dependent variable (the current value at t) to the lagged regressor (the lagged value at t-1).

                  Do you think that my interpretation is correct?

                  Many thanks again!
                  Last edited by Alex Mai; 05 Apr 2018, 13:17.

                  Comment


                  • #10
                    Your choice of lags is correct. The lagged dependent variable L.y is essentially a predetermined variable, hence gmm(L.y, lag(1 .)) which is equivalent to gmm(y, lag(2 .)).

                    I am not sure what the author means by two-way causality between the predetermined variable and the dependent variable.
                    https://twitter.com/Kripfganz

                    Comment


                    • #11
                      Originally posted by Sebastian Kripfganz View Post
                      Your choice of lags is correct. The lagged dependent variable L.y is essentially a predetermined variable, hence gmm(L.y, lag(1 .)) which is equivalent to gmm(y, lag(2 .)).

                      I am not sure what the author means by two-way causality between the predetermined variable and the dependent variable.
                      Dear Sebastian,

                      May I ask one question about missing Difference-in-Hansen test? I just now tried adding one more variable to my equation, and then Stata did not report the Difference-in-Hansen test at all (the exported outcome of Stata stopped at Sargan test and Hansen test). I do not know what goes wrong here.
                      The result is shown as follows:
                      Code:
                      . xtabond2 y L.y year2-year18 x7 x4 x6 x2 x1 d1 d2 cs, gmm(y
                      > , lag(2 3) collapse) gmm(x6, lag(2 3) collapse) iv(cs x1, eq(level)) iv(d1
                      >  d2 x4 x7 x2 year2-year18, eq(level)) twostep robust
                      Favoring space over speed. To switch, type or click on mata: mata set matafavor speed,
                      >  perm.
                      Warning: Two-step estimated covariance matrix of moments is singular.
                        Using a generalized inverse to calculate optimal weighting matrix for two-step estim
                      > ation.
                        Difference-in-Sargan/Hansen statistics may be negative.
                      
                      Dynamic panel-data estimation, two-step system GMM
                      ------------------------------------------------------------------------------
                      Group variable: i                               Number of obs      =       917
                      Time variable : year                            Number of groups   =        60
                      Number of instruments = 28                      Obs per group: min =         3
                      Wald chi2(26) =   1140.90                                      avg =     15.28
                      Prob > chi2   =     0.000                                      max =        16
                      ------------------------------------------------------------------------------
                                   |              Corrected
                             y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                             y |
                               L1. |   .3862145   .0622517     6.20   0.000     .2642034    .5082256
                                   |
                             year2 |  -.0853252   .0277727    -3.07   0.002    -.1397587   -.0308917
                             year3 |          0  (omitted)
                             year4 |  -.0858434   .0261483    -3.28   0.001    -.1370931   -.0345938
                             year5 |          0  (omitted)
                             year6 |  -.0398763   .0233303    -1.71   0.087    -.0856028    .0058501
                             year7 |          0  (omitted)
                             year8 |  -.0362826   .0258199    -1.41   0.160    -.0868888    .0143236
                             year9 |  -.0343727   .0203296    -1.69   0.091    -.0742179    .0054726
                            year10 |  -.0154841   .0276664    -0.56   0.576    -.0697092     .038741
                            year11 |  -.0058569   .0166615    -0.35   0.725    -.0385129    .0267991
                            year12 |    .013753   .0185975     0.74   0.460    -.0226974    .0502035
                            year13 |   .0117277   .0217985     0.54   0.591    -.0309967     .054452
                            year14 |  -.0244108   .0196056    -1.25   0.213     -.062837    .0140154
                            year15 |  -.0301793   .0204674    -1.47   0.140    -.0702947    .0099361
                            year16 |   .0022201   .0191573     0.12   0.908    -.0353276    .0397678
                            year17 |    .008338   .0191025     0.44   0.662    -.0291023    .0457782
                            year18 |  -.0019966    .014509    -0.14   0.891    -.0304337    .0264406
                                x7 |   .0034003   .0084641     0.40   0.688    -.0131891    .0199897
                                x4 |   .0076999   .0084361     0.91   0.361    -.0088346    .0242344
                                x6 |  -.0037636   .0042373    -0.89   0.374    -.0120686    .0045415
                                x2 |  -.0922484   .0311284    -2.96   0.003     -.153259   -.0312378
                                x1 |  -.0093963   .0067857    -1.38   0.166     -.022696    .0039035
                                d1 |  -.1630917    .035516    -4.59   0.000    -.2327018   -.0934816
                                d2 |  -.0140644   .0300584    -0.47   0.640    -.0729777    .0448489
                                cs |   .0933815   .0242519     3.85   0.000     .0458486    .1409143
                             _cons |   5.716706   .5699476    10.03   0.000     4.599629    6.833782
                      ------------------------------------------------------------------------------
                      Instruments for first differences equation
                        GMM-type (missing=0, separate instruments for each period unless collapsed)
                          L(2/3).x6 collapsed
                          L(2/3).y collapsed
                      Instruments for levels equation
                        Standard
                          d1 d2 x4 x7 x2 year2 year3 year4 year5 year6 year7 year8 year9
                          year10 year11 year12 year13 year14 year15 year16 year17 year18
                          cs x1
                          _cons
                        GMM-type (missing=0, separate instruments for each period unless collapsed)
                          DL.x6 collapsed
                          DL.y collapsed
                      ------------------------------------------------------------------------------
                      Arellano-Bond test for AR(1) in first differences: z =  -3.66  Pr > z =  0.000
                      Arellano-Bond test for AR(2) in first differences: z =  -0.23  Pr > z =  0.817
                      ------------------------------------------------------------------------------
                      Sargan test of overid. restrictions: chi2(1)    =   1.50  Prob > chi2 =  0.220
                        (Not robust, but not weakened by many instruments.)
                      Hansen test of overid. restrictions: chi2(1)    =   2.35  Prob > chi2 =  0.125
                        (Robust, but weakened by many instruments.)
                      
                      . 
                      end of do-file
                      The above is what Stata reported. Stata did not give anything about Difference-in-Hansen test.

                      The newly added variable seems to be special, since it is lacking for all observations in year 1, year 3, year 5, and year 7. Without this variable, everything worked well.

                      This is the statistical summary of the variable.
                      Code:
                           Variable |        Obs        Mean    Std. Dev.       Min        Max
                      -------------+---------------------------------------------------------
                               cs |        912   -.7202961    .4150485      -1.72        .81
                      Could you please help to check what the problem is?

                      Many thanks!
                      Last edited by Alex Mai; 10 Apr 2018, 12:24.

                      Comment


                      • #12
                        If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

                        The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.
                        https://twitter.com/Kripfganz

                        Comment


                        • #13
                          Originally posted by Sebastian Kripfganz View Post
                          If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

                          The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.
                          Thank you! I have tried starting from period 8 and it works. But sometimes one of the year dummies is dropped by Stata (xtabond2). It is not shown in the table with a notification of "dropped due to collinearity", instead of being reported as "omitted" in the table. But everything else seems to be fine. Do you think if this may cause problem?

                          You mentioned that omitted dummies cause problem, but in my case the dummy is dropped instead of omitted (e.g. year16 in the following example).

                          I set time dummy as year9-year18, instead of i.year.

                          Code:
                              
                               year16 dropped due to collinearity
                          
                                year9 |  -.0052901   .0244344    -0.22   0.829    -.0531807    .0426006
                               year10 |   .0138947   .0240152     0.58   0.563    -.0331742    .0609637
                               year11 |  -.0005045   .0229553    -0.02   0.982    -.0454961     .044487
                               year12 |  -.0087308   .0238082    -0.37   0.714     -.055394    .0379323
                               year13 |  -.0076787    .017286    -0.44   0.657    -.0415586    .0262013
                               year14 |  -.0275574   .0274015    -1.01   0.315    -.0812634    .0261487
                               year15 |  -.0237062   .0197515    -1.20   0.230    -.0624185    .0150061
                               year17 |  -.0024705   .0136791    -0.18   0.857     -.029281      .02434
                               year18 |  -.0132392   .0180177    -0.73   0.462    -.0485533    .0220748
                          Last edited by Alex Mai; 11 Apr 2018, 07:43.

                          Comment


                          • #14
                            It is not entirely clear to me when xtabond2 "drops" a dummy and when it just "omits" it. The former is less of a problem although there might be a more subtle complication: The year11 dummy is still kept as an instrument (at least it is shown as an instrument below the regression table) despite being dropped as a regressor. That is not what you want. To be on the safe side, I would always recommend to amend the specification until nothing is dropped or omitted any more. In your case, I would just use the dummies from year3 to year13.
                            https://twitter.com/Kripfganz

                            Comment


                            • #15
                              Originally posted by Sebastian Kripfganz View Post
                              It is not entirely clear to me when xtabond2 "drops" a dummy and when it just "omits" it. The former is less of a problem although there might be a more subtle complication: The year11 dummy is still kept as an instrument (at least it is shown as an instrument below the regression table) despite being dropped as a regressor. That is not what you want. To be on the safe side, I would always recommend to amend the specification until nothing is dropped or omitted any more. In your case, I would just use the dummies from year3 to year13.
                              Thanks a lot! After I use year3-year13, rather than year2-year13, no time dummy is dropped any more. The results of other variables and Hansen test are exactly the same as using year2-year12.

                              But what is the rationale behind this approach? Normally, people start from the second time dummy, in order to avoid the dummy trap.

                              Comment

                              Working...
                              X