Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    You lose one more period because of the lagged dependent variable. Your first period in the estimation sample is the second period in your data set.
    https://twitter.com/Kripfganz

    Comment


    • #17
      Originally posted by Sebastian Kripfganz View Post
      You lose one more period because of the lagged dependent variable. Your first period in the estimation sample is the second period in your data set.
      So if the first lag of the dependent variable is used as regressor, then I should not set the time dummy as year2-year10, for example, but as year3-year10, since the first period (L.y) is the second period in dataset. Is my interpretation right?

      But in an answer to another post, you gave the following code with yr2-yr10, rather than yr3-yr10
      What about adding further lags of the dependent variable, e.g, Code:
      Code:
      xtabond2 ltfp L.ltfp L2.ltfp routsales rndva yr2-yr10, iv(yr2-yr10, eq(level)) gmm(routsales rndva, lag(2 3)) gmm(ltfp, lag(2 3)) twostep r obust artests(3)
      https://www.statalist.org/forums/for...nd-deeper-lags Perhaps I misunderstood your point.

      Comment


      • #18
        Your interpretation is correct.

        Presumably, in the other Statalist topic the first year in the data set was year 0 instead of 1, although this is is actually not clear given that the other enquirer did not show his estimation output. It is possible that in his case one time dummy got omitted as well.
        https://twitter.com/Kripfganz

        Comment


        • #19
          Originally posted by Sebastian Kripfganz View Post
          Your interpretation is correct.

          Presumably, in the other Statalist topic the first year in the data set was year 0 instead of 1, although this is is actually not clear given that the other enquirer did not show his estimation output. It is possible that in his case one time dummy got omitted as well.
          I see! Btw, may I ask for your suggestions about the choice of the depth of lag as instruments? I think this is a subjective and contingent issue, but I am really confused about this.

          For instance, x is significant under gmm(x, lag(2 3)) but turns to be insignificant under gmm(x, lag(2 4)) or gmm(x, lag(2 5)). Do you think it makes much sense to say something about the effect of x?

          I think if the change in statistical significance is due to the the weak correlation of deeper lags with the current instrumented variable, then perhaps it is safe to say that
          x has effect on y. Just as what you argued in another post, too deeper lags may be only weakly correlated with the instrumented variable unless the series is persistent.

          But if such change in statistical significance is due to other reasons, perhaps we have to say that the effect of x on y is not robust, depending on the use of lag.

          In some other cases, instrumented variables may turn to be significant after deeper lags are used.

          So is it right to say that experience/intuition or economic theory (if any) are the primary way to decide the depth of lags?

          Many thanks again!
          Last edited by Alex Mai; 11 Apr 2018, 14:40.

          Comment


          • #20
            This lack of robustness is a general problem of this kind of GMM estimation when the cross-sectional dimension is not very large and it indeed reduces the reliability of the estimates. Your intuition goes in the right direction. Your observation that the effect turns statistically insignificant could indeed be a consequence of deeper lags becoming weak. You could use this argumentation to justify a specification with just the second and third lag.

            If coefficients turn statistically significant by adding deeper lags, this would worry me more. These results might be "spurious" as a consequence of having too many instruments.

            I would indeed recommend not include too deep lags if you have good (economic) arguments that these additional lags would only be weekly correlated with the instrumented variables.

            An alternative might be to use the pca option of xtabond2 but that is something for which I cannot provide any help.
            https://twitter.com/Kripfganz

            Comment


            • #21
              Originally posted by Sebastian Kripfganz View Post
              This lack of robustness is a general problem of this kind of GMM estimation when the cross-sectional dimension is not very large and it indeed reduces the reliability of the estimates. Your intuition goes in the right direction. Your observation that the effect turns statistically insignificant could indeed be a consequence of deeper lags becoming weak. You could use this argumentation to justify a specification with just the second and third lag.

              If coefficients turn statistically significant by adding deeper lags, this would worry me more. These results might be "spurious" as a consequence of having too many instruments.

              I would indeed recommend not include too deep lags if you have good (economic) arguments that these additional lags would only be weekly correlated with the instrumented variables.

              An alternative might be to use the pca option of xtabond2 but that is something for which I cannot provide any help.
              Thanks a lot! I remember that you have suggested an enquirer to use the second lag of the dependent variable (L2.y) as a regressor in addition to L.y. I have tried this, but L2.y is highly insignificant. Is this an evidence that L2.y is not useful?

              Normally, insignificant variables may be dropped as a way to keep the model parsimonious. But I am not sure if this still holds for lagged DepVar in System GMM.

              Comment


              • #22
                Adding further lags of the dependent variable as regressors might be useful to avoid a serial correlation of the idiosyncratic errors if the Arellano-Bond AR(2) test provides evidence in that regard.

                Adding a further lag might help to increase the p-value of the AR(2) test even if this additional regressor turns out to be not statistically significant. In that case, it might be worth keeping it nevertheless. Of course, if there is no concern about serial correlation and further lags of the dependent variable are (highly) statistically insignificant, then there is no reason to keep them in the model and a more parsimonious model would be preferred.
                https://twitter.com/Kripfganz

                Comment


                • #23
                  Originally posted by Sebastian Kripfganz View Post
                  Adding further lags of the dependent variable as regressors might be useful to avoid a serial correlation of the idiosyncratic errors if the Arellano-Bond AR(2) test provides evidence in that regard.

                  Adding a further lag might help to increase the p-value of the AR(2) test even if this additional regressor turns out to be not statistically significant. In that case, it might be worth keeping it nevertheless. Of course, if there is no concern about serial correlation and further lags of the dependent variable are (highly) statistically insignificant, then there is no reason to keep them in the model and a more parsimonious model would be preferred.
                  Many thanks again!

                  Comment


                  • #24
                    Originally posted by Sebastian Kripfganz View Post
                    If your new variable has missings for these years, the whole years will be dropped from your estimation sample. But with the resulting gaps, it does not make sense any more to estimate a dynamic model at least for these early years. If you want to keep the new variable, you should restrict your estimation sample to the years from period 8 onwards.

                    The missing Difference-in-Hansen test is an indirect consequence of these gaps. As I have mentioned in some other Statalist topics before, xtabond2 has a severe bug when some variables (in particular time dummies) get omitted. In your case, there are 28 instruments and 24 estimated coefficients (excluding the omitted dummies). This should give 4 degrees of freedom for the Hansen test. Yet, xtabond2 reports only 1 degree of freedom. An immediate consequence is that the p-value for the Hansen test is incorrect. An indirect consequence is that xtabond2 no longer reports Difference-in-Hansen tests because it believes that there are not enough degrees of freedom available to do so. Once you remove the first 7 years from your sample and make sure that no dummies get omitted, the Difference-in-Hansen test should reappear.
                    Dear Sebastian,

                    May I ask for your suggestions about a weird situation of missed Difference-in-Hansen test? For a panel database, if I use -collapse-, then the Difference-In-Hansen test only shows Hansen test for the first-differenced model, without the test for each subset of instruments. This is an almost balanced panel, and no variable is dropped or omitted (so the bug of omitted variable should not matter here).

                    But if I do not use-collapse-, the full Difference-In-Hansen test is reported (but then the number of instrument is larger than that of group).

                    If possible, could you please check if anything is wrong? Many thanks!

                    The following is my command and the stata output.

                    Code:
                    . xtabond2 y L.y x1 x2 x3 x4 year3-year20, gmm(y, lag(2 3) collapse) iv(x1 x2
                    > x3, eq(level)) iv(x4 year3-year20, eq(level)) robust twostep
                    Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm.
                    
                    Dynamic panel-data estimation, two-step system GMM
                    ------------------------------------------------------------------------------
                    Group variable: i                               Number of obs      =      1102
                    Time variable : year                            Number of groups   =        58
                    Number of instruments = 26                      Obs per group: min =        19
                    Wald chi2(23) =    399.43                                      avg =     19.00
                    Prob > chi2   =     0.000                                      max =        19
                    ------------------------------------------------------------------------------
                                 |              Corrected
                               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                               y |
                             L1. |    .327704   .0541904     6.05   0.000     .2214928    .4339153
                                 |
                              x1 |   -.003743   .0027996    -1.34   0.181    -.0092301     .001744
                              x2 |   .0068359   .0190328     0.36   0.719    -.0304677    .0441395
                              x3 |  -.3073064    .068547    -4.48   0.000     -.441656   -.1729568
                              x4 |  -.2079799   .1522344    -1.37   0.172    -.5063538    .0903941
                           year3 |    .063862   .0639817     1.00   0.318    -.0615397    .1892638
                           year4 |   .0546963   .0515861     1.06   0.289    -.0464106    .1558031
                           year5 |   .0420467   .0435978     0.96   0.335    -.0434035    .1274968
                           year6 |   .0619219   .0580356     1.07   0.286    -.0518258    .1756696
                           year7 |    .056716   .0536444     1.06   0.290    -.0484252    .1618572
                           year8 |   .1005629   .0544172     1.85   0.065    -.0060928    .2072187
                           year9 |  -.0018599   .0577488    -0.03   0.974    -.1150453    .1113256
                          year10 |  -.0385923   .0574199    -0.67   0.502    -.1511333    .0739486
                          year11 |   .0336183   .0536857     0.63   0.531    -.0716036    .1388403
                          year12 |   .0320164   .0530338     0.60   0.546    -.0719279    .1359607
                          year13 |   .0593187   .0531552     1.12   0.264    -.0448636    .1635011
                          year14 |   .0551061   .0529566     1.04   0.298    -.0486869    .1588992
                          year15 |   .0790878   .0499249     1.58   0.113    -.0187632    .1769388
                          year16 |   .0573177   .0547685     1.05   0.295    -.0500266     .164662
                          year17 |  -.0037387   .0531859    -0.07   0.944    -.1079811    .1005036
                          year18 |   .0469988   .0537304     0.87   0.382    -.0583109    .1523085
                          year19 |   .0789955   .0551545     1.43   0.152    -.0291052    .1870963
                          year20 |   .0067543    .052748     0.13   0.898    -.0966298    .1101385
                           _cons |   4.600625   1.428604     3.22   0.001     1.800613    7.400637
                    ------------------------------------------------------------------------------
                    Instruments for first differences equation
                      GMM-type (missing=0, separate instruments for each period unless collapsed)
                        L(2/3).y collapsed
                    Instruments for levels equation
                      Standard
                        x4 year3 year4 year5 year6 year7 year8 year9 year10 year11 year12
                        year13 year14 year15 year16 year17 year18 year19 year20
                        x1 x2 x3
                        _cons
                      GMM-type (missing=0, separate instruments for each period unless collapsed)
                        DL.y collapsed
                    ------------------------------------------------------------------------------
                    Arellano-Bond test for AR(1) in first differences: z =  -3.04  Pr > z =  0.002
                    Arellano-Bond test for AR(2) in first differences: z =   1.16  Pr > z =  0.246
                    ------------------------------------------------------------------------------
                    Sargan test of overid. restrictions: chi2(2)    =   3.44  Prob > chi2 =  0.179
                      (Not robust, but not weakened by many instruments.)
                    Hansen test of overid. restrictions: chi2(2)    =   1.47  Prob > chi2 =  0.481
                      (Robust, but weakened by many instruments.)
                    
                    Difference-in-Hansen tests of exogeneity of instrument subsets:
                      GMM instruments for levels
                        Hansen test excluding group:     chi2(1)    =   1.14  Prob > chi2 =  0.286
                        Difference (null H = exogenous): chi2(1)    =   0.33  Prob > chi2 =  0.567
                    
                    .
                    end of do-file
                    Last edited by Alex Mai; 16 Apr 2018, 11:00.

                    Comment


                    • #25
                      The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

                      In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.
                      https://twitter.com/Kripfganz

                      Comment


                      • #26
                        Originally posted by Sebastian Kripfganz View Post
                        The Difference-in-Hansen test can only be computed if the model is still overidentified after removing the respective set of instruments. This is no longer the case for any subset of instruments that does not show up in the Difference-in-Hansen section of the output. Clearly, if you do not collapse the instrument, you will have many more overidentifying restrictions such that the model would still be overidentified after removing any of the subsets.

                        In principle, you could split the option iv(x1 x2 x3, eq(level)) into the three separate options iv(x1, eq(level)) iv(x2, eq(level)) iv(x3, eq(level)), which should give you the respective Difference-in-Hansen test statistics.
                        Thanks a lot! I have tried splitting the set of instruments and it works well. However, Hansen test for the subset with time dummy, for example -iv(x4 year3-year18)-, will never be shown, since it is impossible for the model to be overidentified after removing the too many instruments for time dummies. I do not think this will affect Hansen tests of other sets of instruments and the full model (also the estimation). Do you think that I am right?

                        Thank you again!

                        Comment


                        • #27
                          A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.
                          https://twitter.com/Kripfganz

                          Comment


                          • #28
                            Originally posted by Sebastian Kripfganz View Post
                            A Difference-in-Hansen test for the time dummies is not meaningful. These dummies are deterministic, i.e. exogenous by definition.
                            Thanks a lot! Sometimes two exogenous (but perhaps not deterministic) variables cannot pass Difference-un-Hanse test if they are treated separately (i.e. -iv(x1, eq(level))-, -iv(x2, eq(level))-). But if the two variables are put together in one -iv()-, then Difference-in-Hansen test does not reject the null of exogeneity. May I ask the mechanism underlying this situation?

                            Comment


                            • #29
                              It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
                              • If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.
                              • Do not combine non-deterministic with deterministic instruments in the same group.
                              • On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.
                              • It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.
                              • ...
                              https://twitter.com/Kripfganz

                              Comment


                              • #30
                                Originally posted by Sebastian Kripfganz View Post
                                It is a similar argument why two coefficients might be individually statistically significant but a joint insignificance test does not reject null. Individual hypothesis tests do not account for the covariance between the estimators / between the respective moment functions. For a joint test, it is generally harder to reject the null. You should use economic / econometric theory as a guide to group the instruments:
                                • If two or more instruments naturally belong together (e.g. one instrument may not make sense without the other; or several instruments are justified on the grounds of the same assumption such as mean stationarity for the level instruments), then do not split them into separate groups.
                                • Do not combine non-deterministic with deterministic instruments in the same group.
                                • On the other side, as mentioned in several occasions, split instruments for the differenced model and those for the level model in separate groups.
                                • It might be meaningful to separate the instruments for the lagged dependent variable from the instruments for other regressors because the former particularly rely on the assumption of no serial correlation in the idiosyncratic error term. But it does not really make sense to consider each lagged instrument itself as a separate group for testing purposes.
                                • ...
                                Thank you! But in what situation or for what kind of variables, should I use -iv(x, eq(diff))-? You have argued that -iv(x, eq(level))- makes -iv(x, eq(diff))- asymptotically redundant.

                                And can I understand the Hansen test for the first-differenced model (the very first part in the Difference-in-Hansen test) as the test for the validity of lagged dependent and endogenous variables as instruments for the first-differenced dependent and endogenous variables in the first-differenced equation?

                                Comment

                                Working...
                                X