Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    Not necessarily. xtabond2 computes the respective Difference-in-Hansen tests even when you do not separate them. But separating the options helps to understand how the GMM estimator is actually constructed. (It becomes less of a black box.)
    Thanks a lot! Btw, do you think that xtabond2 is suitable for (dynamic) Linear Probability Model with binary dependent variable? As far as I know perhaps there is not stata command for dynamic logit model. I do see someone uses xtabond2 to estimate binary dependent in a dynamic situation.

    One problem with Linear Probability Model is the possibility of negative fitted value, but studies have shown that except extreme situations (e.g. probability like 99% or 1%) the odds ratios are almost linear function of probability, which supports the use of Linear Probability Model. But I am not sure if this also holds for System GMM.
    Last edited by Alex Mai; 17 Apr 2018, 13:10.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Not necessarily. xtabond2 computes the respective Difference-in-Hansen tests even when you do not separate them. But separating the options helps to understand how the GMM estimator is actually constructed. (It becomes less of a black box.)

    Leave a comment:


  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    My comment mainly referred to the gmm() option that creates lagged levels as instruments for the first-differenced model and differences as instruments for the level model. Here, the latter do not turn the former redundant.
    • On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.
    So do you mean that for -gmm()- it is also recommended to split into -gmm( eq(level))- and -gmm( eq(diff))-? What I learn from earlier posts is that eq(level) and eq(diff) should be separately set for -iv()-, but I am not sure about -gmm()-.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Originally posted by Alex Mai View Post
    But in what situation or for what kind of variables, should I use -iv(x, eq(diff))-? You have argued that -iv(x, eq(level))- makes -iv(x, eq(diff))- asymptotically redundant.
    My comment mainly referred to the gmm() option that creates lagged levels as instruments for the first-differenced model and differences as instruments for the level model. Here, the latter do not turn the former redundant.

    Originally posted by Alex Mai View Post
    And can I understand the Hansen test for the first-differenced model (the very first part in the Difference-in-Hansen test) as the test for the validity of lagged dependent and endogenous variables as instruments for the first-differenced dependent and endogenous variables in the first-differenced equation?
    In short, yes. But keep in mind that, strictly speaking, the overidentification tests are not just test for the validity of instruments. If you reject the null hypothesis, this might be because your instruments are indeed invalid given that the model is otherwise correctly specified, or it might be that your model suffers from another form of misspecification.

    Leave a comment:


  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
    • If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.
    • Do not combine non-deterministic with deterministic instruments in the same group.
    • On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.
    • It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.
    • ...
    Thank you! But in what situation or for what kind of variables, should I use -iv(x, eq(diff))-? You have argued that -iv(x, eq(level))- makes -iv(x, eq(diff))- asymptotically redundant.

    And can I understand the Hansen test for the first-differenced model (the very first part in the Difference-in-Hansen test) as the test for the validity of lagged dependent and endogenous variables as instruments for the first-differenced dependent and endogenous variables in the first-differenced equation?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
    • If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.
    • Do not combine non-deterministic with deterministic instruments in the same group.
    • On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.
    • It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.
    • ...

    Leave a comment:


  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.
    Thanks a lot! Sometimes two exogenous (but perhaps not deterministic) variables cannot pass Difference-un-Hanse test if they are treated separately (i.e. -iv(x1, eq(level))-, -iv(x2, eq(level))-). But if the two variables are put together in one -iv()-, then Difference-in-Hansen test does not reject the null of exogeneity. May I ask the mechanism underlying this situation?

    Leave a comment:


  • Sebastian Kripfganz
    replied
    A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.

    Leave a comment:


  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

    In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.
    Thanks a lot! I have tried splitting the set of instruments and it works well. However, Hansen test for the subset with time dummy, for example -iv(x4 year3-year18)-, will never be shown, since it is impossible for the model to be overidentified after removing the too many instruments for time dummies. I do not think this will affect Hansen tests of other sets of instruments and the full model (also the estimation). Do you think that I am right?

    Thank you again!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

    In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.

    Leave a comment:


  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

    The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.
    Dear Sebastian,

    May I ask for your suggestions about a weird situation of missed Difference-in-Hansen test? For a panel database, if I use -collapse-, then the Difference-In-Hansen test only shows Hansen test for the first-differenced model, without the test for each subset of instruments. This is an almost balanced panel, and no variable is dropped or omitted (so the bug of omitted variable should not matter here).

    But if I do not use-collapse-, the full Difference-In-Hansen test is reported (but then the number of instrument is larger than that of group).

    If possible, could you please check if anything is wrong? Many thanks!

    The following is my command and the stata output.

    Code:
    . xtabond2 y L.y x1 x2 x3 x4 year3-year20, gmm(y, lag(2 3) collapse) iv(x1 x2
    > x3, eq(level)) iv(x4 year3-year20, eq(level)) robust twostep
    Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
    
    Dynamic panel-data estimation, two-step system GMM
    ------------------------------------------------------------------------------
    Group variable: i                               Number of obs      =      1102
    Time variable : year                            Number of groups   =        58
    Number of instruments = 26                      Obs per group: min =        19
    Wald chi2(23) =    399.43                                      avg =     19.00
    Prob > chi2   =     0.000                                      max =        19
    ------------------------------------------------------------------------------
                 |              Corrected
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
               y |
             L1. |    .327704   .0541904     6.05   0.000     .2214928    .4339153
                 |
              x1 |   -.003743   .0027996    -1.34   0.181    -.0092301     .001744
              x2 |   .0068359   .0190328     0.36   0.719    -.0304677    .0441395
              x3 |  -.3073064    .068547    -4.48   0.000     -.441656   -.1729568
              x4 |  -.2079799   .1522344    -1.37   0.172    -.5063538    .0903941
           year3 |    .063862   .0639817     1.00   0.318    -.0615397    .1892638
           year4 |   .0546963   .0515861     1.06   0.289    -.0464106    .1558031
           year5 |   .0420467   .0435978     0.96   0.335    -.0434035    .1274968
           year6 |   .0619219   .0580356     1.07   0.286    -.0518258    .1756696
           year7 |    .056716   .0536444     1.06   0.290    -.0484252    .1618572
           year8 |   .1005629   .0544172     1.85   0.065    -.0060928    .2072187
           year9 |  -.0018599   .0577488    -0.03   0.974    -.1150453    .1113256
          year10 |  -.0385923   .0574199    -0.67   0.502    -.1511333    .0739486
          year11 |   .0336183   .0536857     0.63   0.531    -.0716036    .1388403
          year12 |   .0320164   .0530338     0.60   0.546    -.0719279    .1359607
          year13 |   .0593187   .0531552     1.12   0.264    -.0448636    .1635011
          year14 |   .0551061   .0529566     1.04   0.298    -.0486869    .1588992
          year15 |   .0790878   .0499249     1.58   0.113    -.0187632    .1769388
          year16 |   .0573177   .0547685     1.05   0.295    -.0500266     .164662
          year17 |  -.0037387   .0531859    -0.07   0.944    -.1079811    .1005036
          year18 |   .0469988   .0537304     0.87   0.382    -.0583109    .1523085
          year19 |   .0789955   .0551545     1.43   0.152    -.0291052    .1870963
          year20 |   .0067543    .052748     0.13   0.898    -.0966298    .1101385
           _cons |   4.600625   1.428604     3.22   0.001     1.800613    7.400637
    ------------------------------------------------------------------------------
    Instruments for first differences equation
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        L(2/3).y collapsed
    Instruments for levels equation
      Standard
        x4 year3 year4 year5 year6 year7 year8 year9 year10 year11 year12
        year13 year14 year15 year16 year17 year18 year19 year20
        x1 x2 x3
        _cons
      GMM-type (missing=0, separate instruments for each period unless collapsed)
        DL.y collapsed
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -3.04  Pr > z =  0.002
    Arellano-Bond test for AR(2) in first differences: z =   1.16  Pr > z =  0.246
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(2)    =   3.44  Prob > chi2 =  0.179
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(2)    =   1.47  Prob > chi2 =  0.481
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(1)    =   1.14  Prob > chi2 =  0.286
        Difference (null H = exogenous): chi2(1)    =   0.33  Prob > chi2 =  0.567
    
    .
    end of do-file
    Last edited by Alex Mai; 16 Apr 2018, 11:00.

    Leave a comment:


  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    Adding further lags of the dependent variable as regressors might be useful to avoid a serial correlation of the idiosyncratic errors if the Arellano-Bond AR(2) test provides evidence in that regard.

    Adding a further lag might help to increase the p-value of the AR(2) test even if this additional regressor turns out to be not statistically significant. In that case, it might be worth keeping it nevertheless. Of course, if there is no concern about serial correlation and further lags of the dependent variable are (highly) statistically insignificant, then there is no reason to keep them in the model and a more parsimonious model would be preferred.
    Many thanks again!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Adding further lags of the dependent variable as regressors might be useful to avoid a serial correlation of the idiosyncratic errors if the Arellano-Bond AR(2) test provides evidence in that regard.

    Adding a further lag might help to increase the p-value of the AR(2) test even if this additional regressor turns out to be not statistically significant. In that case, it might be worth keeping it nevertheless. Of course, if there is no concern about serial correlation and further lags of the dependent variable are (highly) statistically insignificant, then there is no reason to keep them in the model and a more parsimonious model would be preferred.

    Leave a comment:


  • Alex Mai
    replied
    Originally posted by Sebastian Kripfganz View Post
    This lack of robustness is a general problem of this kind of GMM estimation when the cross-sectional dimension is not very large and it indeed reduces the reliability of the estimates. Your intuition goes in the right direction. Your observation that the effect turns statistically insignificant could indeed be a consequence of deeper lags becoming weak. You could use this argumentation to justify a specification with just the second and third lag.

    If coefficients turn statistically significant by adding deeper lags, this would worry me more. These results might be "spurious" as a consequence of having too many instruments.

    I would indeed recommend not include too deep lags if you have good (economic) arguments that these additional lags would only be weekly correlated with the instrumented variables.

    An alternative might be to use the pca option of xtabond2 but that is something for which I cannot provide any help.
    Thanks a lot! I remember that you have suggested an enquirer to use the second lag of the dependent variable (L2.y) as a regressor in addition to L.y. I have tried this, but L2.y is highly insignificant. Is this an evidence that L2.y is not useful?

    Normally, insignificant variables may be dropped as a way to keep the model parsimonious. But I am not sure if this still holds for lagged DepVar in System GMM.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    This lack of robustness is a general problem of this kind of GMM estimation when the cross-sectional dimension is not very large and it indeed reduces the reliability of the estimates. Your intuition goes in the right direction. Your observation that the effect turns statistically insignificant could indeed be a consequence of deeper lags becoming weak. You could use this argumentation to justify a specification with just the second and third lag.

    If coefficients turn statistically significant by adding deeper lags, this would worry me more. These results might be "spurious" as a consequence of having too many instruments.

    I would indeed recommend not include too deep lags if you have good (economic) arguments that these additional lags would only be weekly correlated with the instrumented variables.

    An alternative might be to use the pca option of xtabond2 but that is something for which I cannot provide any help.

    Leave a comment:

Working...
X