Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Specify interaction/square terms with xtabond2 & xtdpdgmm

    Dear all,

    First of all, I would like to confirm that I have searched and read many posts here but no extant solution could be found.

    I am now working with xtabond2 to conduct two-step sys-GMM estimation. I have read Roodman (2009) and Prof. Sebastian Kripfganz's presentation slides. But my case is a bit uncommon, so I still cannot figure out all the issues by exploring these materials.

    To clarify, I do not have a lagged dependent variable in the right-side equation. The reason I run GMM estimation is because for the purpose of robustness check, I have to address endogeneity while I cannot find proper external instrument variables.

    My observations in total are more than 600,000 with a time span of 22 years. My core predictor is a macro-level variable (i.e. yearly difference △Xt, △Xt-1, △Xt-2, etc.) and the dependent variable is a micro-level variable (i.e. individual choice). In my OLS & fixed-effect model, I find a U-shaped relationship (convexity), so I want to add the square term of my core predictor to the GMM estimation. But by specifying it as GMM-style instruments, the Hansen test is always significant (fairly below 0.25, just around 0.01 most of time). I tried all the positions it could be placed in, and have found that by treating it as exogenous and putting it in the IV-style instrument, I obtain statistically significant results and a decent Hansen test p-value (>0.40).

    1. My first confusion is, I treat the core predictor as endogenous, and put it in the GMM-style instrument with its second- and higher-orders (lag2-lag21). In this way, can I treat its square term as exogenous?

    2. Arellano-Bond test rejects the null until AR(6), is it still okay for me to include lags of 1-5 as instruments? Since I don't have lagged dependent variable in the model, so I am unsure whether Arellano-Bond test still applies to my case.

    3. From Prof. Sebastian Kripfganz's slides, I learn that dummy variables are usually treated as exogenous and put in the IV-style instrument with the level option, but how about the interaction term between endogenous / predetermined variables and dummies? If Hansen test and Difference-in-Hansen tests are all satisfied (fairly >0.25), is it justifiable to treat the interaction terms as exogenous?

    Lastly, I have run my specification with xtdpdgmm command before, but due to the number of my observations is quite large, I cannot obtain the result even after waiting for more than 30 minutes. Is there any way that I can speed up running xtdpdgmm?

    Hereby, I leave my codes:
    Code:
    xtabond2 migrate i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.L.gap_jobdiff3ex##c.L.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu i.yr2-yr22 , ///
    gmmstyle(gap_jobdiff3ex, lag(2 .) orthogonal collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy , lag(1 .) collapse) ///
    ivstyle(gap_highedu gap_med) ///
    ivstyle(c.L.gap_jobdiff3ex#c.L.gap_jobdiff3ex i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome i.yr2-yr22 , eq(level)) ///
    small twostep artests(6) cluster(dest_code)
    Note: i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome are time-invariant variables. I confirm that I realize that to include them, a stronger assumption is imposed on the estimation.

    Here is the test results:
    Code:
    ------------------------------------------------------------------------------
    Group variable: numeric_un~e                    Number of obs      =    670476
    Time variable : time                            Number of groups   =     57429
    Number of instruments = 94                      Obs per group: min =         1
    F(30, 272)    =    109.21                                      avg =     11.67
    Prob > F      =     0.000                                      max =        17
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -7.40  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =  -3.58  Pr > z =  0.000
    Arellano-Bond test for AR(3) in first differences: z =  -7.87  Pr > z =  0.000
    Arellano-Bond test for AR(4) in first differences: z =  -3.47  Pr > z =  0.001
    Arellano-Bond test for AR(5) in first differences: z =  -3.06  Pr > z =  0.002
    Arellano-Bond test for AR(6) in first differences: z =  -0.95  Pr > z =  0.342
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(63)   =89629.95 Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(63)   =  62.20  Prob > chi2 =  0.505
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(59)   =  59.02  Prob > chi2 =  0.475
        Difference (null H = exogenous): chi2(4)    =   3.18  Prob > chi2 =  0.528
      gmm(gap_jobdiff3ex, collapse orthogonal lag(2 .))
        Hansen test excluding group:     chi2(49)   =  52.77  Prob > chi2 =  0.331
        Difference (null H = exogenous): chi2(14)   =   9.43  Prob > chi2 =  0.802
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse lag(1 .))
        Hansen test excluding group:     chi2(10)   =  12.31  Prob > chi2 =  0.265
        Difference (null H = exogenous): chi2(53)   =  49.89  Prob > chi2 =  0.596
      iv(gap_highedu gap_med)
        Hansen test excluding group:     chi2(61)   =  60.85  Prob > chi2 =  0.481
        Difference (null H = exogenous): chi2(2)    =   1.35  Prob > chi2 =  0.509
      iv(cL.gap_jobdiff3ex#cL.gap_jobdiff3ex 0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome 0b.yr2 1.yr2 0b.yr3 1.yr3 0b.yr4 1.yr4 0b.yr5 1.yr5 0b.yr6 1.yr6 0b.yr7 1.yr7 0b.yr8 1.yr8 0b.yr9 1.yr9 0b.yr10 1.yr10 0b.yr 11 1.yr11 0b.yr12 1.yr12 0b.yr13 1.yr13 0b.yr14 1.yr14 0b.yr15 1.yr15 0b.yr16 1.yr16 0b.yr17 1.yr17 0b.yr18 1.yr18 0b.yr19 1.yr19 0b.yr20 1.yr20 0b.yr21 1.yr21 0b.yr22 1.yr22, eq(level))
        Hansen test excluding group:     chi2(39)   =  39.88  Prob > chi2 =  0.431
        Difference (null H = exogenous): chi2(24)   =  22.32  Prob > chi2 =  0.560
    Thanks for any comments!
    Last edited by Huaxin Wanglu; 05 Mar 2021, 18:51.

  • #2
    It's sad no one can help...?

    Comment


    • #3
      1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
      2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
      3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
      4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
      https://twitter.com/Kripfganz

      Comment


      • #4
        Originally posted by Sebastian Kripfganz View Post
        1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
        2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
        3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
        4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
        Thanks a lot for the comments. For point 2, does it mean if my specification accepts the null hypothesis at AR(6), I have to use lag5-21 instead of lag2-21 as the instruments?

        Comment


        • #5
          Originally posted by Sebastian Kripfganz View Post
          1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
          2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
          3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
          4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
          Sorry for two more questions:
          1. Actually I don't want to run my endogenous variable in the diff-model since it is per se a differenced variable. If I specify the lags of level and lags of difference as instruments manually, will serial correlation still invalidate the instrument?
          2. My dependent variable is binary. I remember you have mentioned somewhere that it may have problems with mean stationarity assumption, so I had better not include it if it's not theoretically necessary?

          Such as:
          Code:
          gmmstyle(gap_jobdiff19 , lag(2 .) eq(level) collapse) ///
          gmmstyle(D.gap_jobdiff19 , lag(1 .) eq(level) collapse) ///
          By including the lagged dependent variable, my specification passes AR(2), but I am afraid that as it is binary, I cannot treat it as common lagged dependent variable.
          Last edited by Huaxin Wanglu; 08 Mar 2021, 12:11.

          Comment


          • #6
            Originally posted by Sebastian Kripfganz View Post
            1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
            2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
            3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
            4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
            I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

            Codes 1:
            Code:
            xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)
            Codes 2:
            Code:
            xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)
            As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

            Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

            Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.
            Last edited by Huaxin Wanglu; 08 Mar 2021, 15:47.

            Comment


            • #7
              I update my new codes and results here. In this version, I include the lagged dependent variable.

              Code:
              xtabond2 migrate L.migrate a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
              c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu yr2-yr22, ///
              gmmstyle(migrate, lag(1 1) eq(level)) /// predetermined
              gmmstyle(migrate, lag(2 .) eq(diff) collapse) ///
              gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(1 1) eq(level)) //// endogenous
              gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(2 .) eq(diff) collapse) ///
              gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(0 0) eq(level)) /// predetermined
              gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(1 .) eq(diff) collapse) ///
              ivstyle(gap_med gap_highedu, eq(level)) /// exogenous
              ivstyle(i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
              small twostep artests(4) cluster(dest_code)
              Code:
              ------------------------------------------------------------------------------
              Arellano-Bond test for AR(1) in first differences: z = -57.35  Pr > z =  0.000
              Arellano-Bond test for AR(2) in first differences: z =  -0.97  Pr > z =  0.331
              Arellano-Bond test for AR(3) in first differences: z =  -0.03  Pr > z =  0.976
              Arellano-Bond test for AR(4) in first differences: z =   0.64  Pr > z =  0.521
              ------------------------------------------------------------------------------
              Sargan test of overid. restrictions: chi2(190)  =2327.26  Prob > chi2 =  0.000
                (Not robust, but not weakened by many instruments.)
              Hansen test of overid. restrictions: chi2(190)  = 196.63  Prob > chi2 =  0.356
                (Robust, but weakened by many instruments.)
              
              Difference-in-Hansen tests of exogeneity of instrument subsets:
                GMM instruments for levels
                  Hansen test excluding group:     chi2(92)   = 103.26  Prob > chi2 =  0.198
                  Difference (null H = exogenous): chi2(98)   =  93.37  Prob > chi2 =  0.613
                gmm(migrate, eq(level) lag(1 1))
                  Hansen test excluding group:     chi2(174)  = 187.76  Prob > chi2 =  0.225
                  Difference (null H = exogenous): chi2(16)   =   8.87  Prob > chi2 =  0.919
                gmm(migrate, collapse eq(diff) lag(2 .))
                  Hansen test excluding group:     chi2(174)  = 190.77  Prob > chi2 =  0.182
                  Difference (null H = exogenous): chi2(16)   =   5.86  Prob > chi2 =  0.990
                gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
                  Hansen test excluding group:     chi2(158)  = 164.21  Prob > chi2 =  0.351
                  Difference (null H = exogenous): chi2(32)   =  32.43  Prob > chi2 =  0.446
                gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
                  Hansen test excluding group:     chi2(158)  = 178.05  Prob > chi2 =  0.131
                  Difference (null H = exogenous): chi2(32)   =  18.58  Prob > chi2 =  0.972
                gmm(gap_ppden gap_enterprise gap_unemploy, eq(level) lag(0 0))
                  Hansen test excluding group:     chi2(140)  = 155.43  Prob > chi2 =  0.176
                  Difference (null H = exogenous): chi2(50)   =  41.20  Prob > chi2 =  0.808
                gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
                  Hansen test excluding group:     chi2(140)  = 159.45  Prob > chi2 =  0.125
                  Difference (null H = exogenous): chi2(50)   =  37.18  Prob > chi2 =  0.910
                iv(gap_med gap_highedu, eq(level))
                  Hansen test excluding group:     chi2(188)  = 195.96  Prob > chi2 =  0.330
                  Difference (null H = exogenous): chi2(2)    =   0.67  Prob > chi2 =  0.715
                iv(0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11
              > yr12 yr13 yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
                  Hansen test excluding group:     chi2(166)  = 183.18  Prob > chi2 =  0.171
                  Difference (null H = exogenous): chi2(24)   =  13.45  Prob > chi2 =  0.958
              Last edited by Huaxin Wanglu; 08 Mar 2021, 18:23.

              Comment


              • #8
                Originally posted by Huaxin Wanglu View Post

                I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

                Codes 1:
                Code:
                xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)
                Codes 2:
                Code:
                xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)
                As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

                Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

                Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.
                Ah, I have figured out this difference by reading your old posts
                HTML Code:
                https://www.statalist.org/forums/forum/general-stata-discussion/general/1395858-xtdpdgmm-new-stata-command-for-efficient-gmm-estimation-of-linear-dynamic-panel-models-with-nonlinear-moment-conditions/page2
                And also, I have realized that my codes in xtabond2 are not completely equivalent since I did not collapse the differenced instruments for the level model (I prefer not).

                If possible, could you take a look at my codes posted in #7?
                Last edited by Huaxin Wanglu; 08 Mar 2021, 18:21.

                Comment


                • #9
                  Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.
                  https://twitter.com/Kripfganz

                  Comment


                  • #10
                    Originally posted by Sebastian Kripfganz View Post
                    Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.
                    Thanks a million. Your kind replies indeed help a lot!

                    Comment


                    • #11
                      Originally posted by Sebastian Kripfganz View Post
                      Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.
                      Hello, may I ask you another question? To address reverse causality, I lag the variables with one period in OLS & FE, but when I use L.gap_jobdiff3ex instead of gap_jobdiff3ex in GMM, the p-values of Hansen test reduce to 0.10 and Difference-in-Hansen tests cannot fully pass. I guess this may because by lagging the variables, deeper lags suffer the weakened instruments problem. From a paper, I learn that GMM can tackle reverse causality without lagging. Because the paper my conceptual framework base on lag all the variables with one period in sys-GMM, so I am quite concerned with this option. Could you leave me some tips for if I should lag one period in GMM and how? By lagging one period, I also tried to add second lag of my dependent variable to the model. Since AR test accepts the null at AR(3), so I revise the codes to be as below but the coefficient of the L2 is negative.

                      In principle, the Arellano-Bond (AB) estimator and related dynamic panel models offer a powerful toolbox to tackle endogeneity problems caused by both reverse causality and unobserved heterogeneity.
                      We rely on the approach advocated by Arellano and Bond (1991) taking first differences in a first step to remove unobserved heterogeneity and then using second- and higher-order lags of the dependent variables as instruments in a standard GMM framework to deal with reverse causality.
                      Code:
                      gmmstyle(migrate, lag(2 2) eq(level)) ///
                      gmmstyle(migrate, lag(3 .) eq(diff) collapse) ///

                      Results with two lagged dependent variables:
                      Code:
                      -----------------------------------------------------------------------------------------------------
                                                          |              Corrected
                                                  migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      ------------------------------------+----------------------------------------------------------------
                                                  migrate |
                                                      L1. |   1.159987   .0730317    15.88   0.000     1.016205    1.303768
                                                      L2. |  -.1898163   .0698226    -2.72   0.007    -.3272801   -.0523526
                                                          |
                                                    a2003 |  -.0000582   .0001527    -0.38   0.703    -.0003588    .0002423
                                                   co_age |  -.0003186   .0000309   -10.32   0.000    -.0003794   -.0002578
                                             dy_schooling |   .0002668   .0000481     5.55   0.000     .0001722    .0003614
                                                 marriage |   -.004595   .0006148    -7.47   0.000    -.0058053   -.0033846
                                               hukou_type |  -.0007282   .0004208    -1.73   0.085    -.0015566    .0001001
                                                   a2025b |  -.0001572   .0001256    -1.25   0.212    -.0004046    .0000901
                                                 InIncome |   .0003588   .0001017     3.53   0.000     .0001587     .000559
                                                          |
                                           gap_jobdiff3ex |
                                                      L1. |   .0000635   .0000702     0.91   0.366    -.0000747    .0002017
                                                          |
                      cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   1.67e-06   6.29e-07     2.65   0.009     4.28e-07    2.91e-06
                                                          |
                                                gap_ppden |
                                                      L1. |   5.55e-06   2.44e-06     2.27   0.024     7.45e-07    .0000104
                                                          |
                                             gap_unemploy |
                                                      L1. |  -.0308105   .1619523    -0.19   0.849     -.349655    .2880341
                                                          |
                                           gap_enterprise |
                                                      L1. |   .0004517   .0003214     1.41   0.161    -.0001811    .0010844
                                                          |
                                                  gap_med |
                                                      L1. |   5.050586   1.232691     4.10   0.000     2.623718    7.477454
                                                          |
                                              gap_highedu |
                                                      L1. |   .3269975   .0754531     4.33   0.000     .1784488    .4755461
                      Code:
                      ------------------------------------------------------------------------------
                      Arellano-Bond test for AR(1) in first differences: z =  -8.77  Pr > z =  0.000
                      Arellano-Bond test for AR(2) in first differences: z =   2.73  Pr > z =  0.006
                      Arellano-Bond test for AR(3) in first differences: z =  -0.04  Pr > z =  0.967
                      Arellano-Bond test for AR(4) in first differences: z =  -0.59  Pr > z =  0.558
                      ------------------------------------------------------------------------------
                      Sargan test of overid. restrictions: chi2(189)  =1776.06  Prob > chi2 =  0.000
                        (Not robust, but not weakened by many instruments.)
                      Hansen test of overid. restrictions: chi2(189)  = 212.81  Prob > chi2 =  0.113
                        (Robust, but weakened by many instruments.)
                      
                      Difference-in-Hansen tests of exogeneity of instrument subsets:
                        GMM instruments for levels
                          Hansen test excluding group:     chi2(91)   =  94.47  Prob > chi2 =  0.381
                          Difference (null H = exogenous): chi2(98)   = 118.34  Prob > chi2 =  0.079
                        gmm(migrate, eq(level) lag(2 2))
                          Hansen test excluding group:     chi2(173)  = 189.50  Prob > chi2 =  0.185
                          Difference (null H = exogenous): chi2(16)   =  23.32  Prob > chi2 =  0.106
                        gmm(migrate, collapse eq(diff) lag(3 .))
                          Hansen test excluding group:     chi2(173)  = 191.34  Prob > chi2 =  0.161
                          Difference (null H = exogenous): chi2(16)   =  21.47  Prob > chi2 =  0.161
                        gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
                          Hansen test excluding group:     chi2(157)  = 177.20  Prob > chi2 =  0.129
                          Difference (null H = exogenous): chi2(32)   =  35.61  Prob > chi2 =  0.302
                        gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
                          Hansen test excluding group:     chi2(157)  = 169.57  Prob > chi2 =  0.233
                          Difference (null H = exogenous): chi2(32)   =  43.25  Prob > chi2 =  0.089
                        gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
                          Hansen test excluding group:     chi2(139)  = 149.53  Prob > chi2 =  0.256
                          Difference (null H = exogenous): chi2(50)   =  63.28  Prob > chi2 =  0.098
                        gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
                          Hansen test excluding group:     chi2(139)  = 173.36  Prob > chi2 =  0.026
                          Difference (null H = exogenous): chi2(50)   =  39.45  Prob > chi2 =  0.858
                        iv(L.gap_med L.gap_highedu, eq(level))
                          Hansen test excluding group:     chi2(187)  = 210.15  Prob > chi2 =  0.118
                          Difference (null H = exogenous): chi2(2)    =   2.66  Prob > chi2 =  0.264
                        iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
                      > r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
                          Hansen test excluding group:     chi2(165)  = 182.91  Prob > chi2 =  0.161
                          Difference (null H = exogenous): chi2(24)   =  29.90  Prob > chi2 =  0.188

                      Results with only 1 lagged dependent variable:
                      Code:
                      -----------------------------------------------------------------------------------------------------
                                                          |              Corrected
                                                  migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      ------------------------------------+----------------------------------------------------------------
                                                  migrate |
                                                      L1. |   .9635485   .0053282   180.84   0.000     .9530589    .9740382
                                                          |
                                                    a2003 |  -.0001579   .0001957    -0.81   0.420    -.0005433    .0002274
                                                   co_age |   -.000373   .0000264   -14.11   0.000     -.000425    -.000321
                                             dy_schooling |   .0003459   .0000477     7.25   0.000      .000252    .0004398
                                                 marriage |  -.0047542   .0007491    -6.35   0.000    -.0062289   -.0032795
                                               hukou_type |  -.0008414   .0005379    -1.56   0.119    -.0019003    .0002174
                                                   a2025b |  -.0001694   .0001507    -1.12   0.262     -.000466    .0001272
                                                 InIncome |   .0004433   .0001295     3.42   0.001     .0001882    .0006983
                                                          |
                                           gap_jobdiff3ex |
                                                      L1. |   .0000999   .0000775     1.29   0.198    -.0000526    .0002524
                                                          |
                      cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   2.01e-06   7.03e-07     2.86   0.005     6.28e-07    3.40e-06
                                                          |
                                                gap_ppden |
                                                      L1. |   7.02e-06   2.77e-06     2.53   0.012     1.55e-06    .0000125
                                                          |
                                             gap_unemploy |
                                                      L1. |  -.1215351    .169508    -0.72   0.474     -.455244    .2121739
                                                          |
                                           gap_enterprise |
                                                      L1. |    .000753   .0003926     1.92   0.056      -.00002     .001526
                                                          |
                                                  gap_med |
                                                      L1. |   5.361344    1.37739     3.89   0.000     2.649688       8.073
                                                          |
                                              gap_highedu |
                                                      L1. |   .4120449   .0989178     4.17   0.000     .2173063    .6067835
                      Code:
                      ------------------------------------------------------------------------------
                      Arellano-Bond test for AR(1) in first differences: z = -59.78  Pr > z =  0.000
                      Arellano-Bond test for AR(2) in first differences: z =   1.08  Pr > z =  0.282
                      Arellano-Bond test for AR(3) in first differences: z =  -0.13  Pr > z =  0.894
                      Arellano-Bond test for AR(4) in first differences: z =  -1.06  Pr > z =  0.289
                      ------------------------------------------------------------------------------
                      Sargan test of overid. restrictions: chi2(191)  =2240.55  Prob > chi2 =  0.000
                        (Not robust, but not weakened by many instruments.)
                      Hansen test of overid. restrictions: chi2(191)  = 215.61  Prob > chi2 =  0.107
                        (Robust, but weakened by many instruments.)
                      
                      Difference-in-Hansen tests of exogeneity of instrument subsets:
                        GMM instruments for levels
                          Hansen test excluding group:     chi2(93)   =  97.64  Prob > chi2 =  0.351
                          Difference (null H = exogenous): chi2(98)   = 117.97  Prob > chi2 =  0.083
                        gmm(migrate, eq(level) lag(1 1))
                          Hansen test excluding group:     chi2(175)  = 194.32  Prob > chi2 =  0.151
                          Difference (null H = exogenous): chi2(16)   =  21.29  Prob > chi2 =  0.168
                        gmm(migrate, collapse eq(diff) lag(2 .))
                          Hansen test excluding group:     chi2(174)  = 200.42  Prob > chi2 =  0.083
                          Difference (null H = exogenous): chi2(17)   =  15.18  Prob > chi2 =  0.582
                        gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
                          Hansen test excluding group:     chi2(159)  = 172.30  Prob > chi2 =  0.223
                          Difference (null H = exogenous): chi2(32)   =  43.31  Prob > chi2 =  0.088
                        gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
                          Hansen test excluding group:     chi2(159)  = 172.53  Prob > chi2 =  0.219
                          Difference (null H = exogenous): chi2(32)   =  43.08  Prob > chi2 =  0.091
                        gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
                          Hansen test excluding group:     chi2(141)  = 172.45  Prob > chi2 =  0.037
                          Difference (null H = exogenous): chi2(50)   =  43.16  Prob > chi2 =  0.742
                        gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
                          Hansen test excluding group:     chi2(141)  = 176.85  Prob > chi2 =  0.022
                          Difference (null H = exogenous): chi2(50)   =  38.76  Prob > chi2 =  0.876
                        iv(L.gap_med L.gap_highedu, eq(level))
                          Hansen test excluding group:     chi2(189)  = 214.05  Prob > chi2 =  0.102
                          Difference (null H = exogenous): chi2(2)    =   1.56  Prob > chi2 =  0.458
                        iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
                      > r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
                          Hansen test excluding group:     chi2(167)  = 187.84  Prob > chi2 =  0.129
                          Difference (null H = exogenous): chi2(24)   =  27.76  Prob > chi2 =  0.270

                      Leszczensky, L., & Wolbring, T. (2019). How to Deal With Reverse Causality Using Panel Data? Recommendations for Researchers Based on a Simulation Study. Sociological Methods & Research. https://doi.org/10.1177/0049124119882473
                      Last edited by Huaxin Wanglu; 10 Mar 2021, 15:23.

                      Comment


                      • #12
                        Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

                        There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.
                        https://twitter.com/Kripfganz

                        Comment


                        • #13
                          Originally posted by Sebastian Kripfganz View Post
                          Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

                          There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.
                          Lots of thanks for the comment! You save me a lot of time. It is indeed succinct and informative.

                          Comment


                          • #14
                            Originally posted by Sebastian Kripfganz View Post
                            Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

                            There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.
                            Dear Prof. Sebastian Kripfganz, may I ask you a new question? Roodman(2009) mentions that a p-value of Hansen test as high as 0.25 should be viewed with concern. It implies a safe value falling into the range of 0.1 and 0.25 (if I understand correctly?). However, after comparing over a hundred times, I find that when the p-value is within this range, the C test (Difference-in-Hansen) usually cannot safely accept since at least one of the subset for either excluding group or including group would be smaller than 0.1 and sometimes, even <0.05. I am hesitated about how to make trade-off between them. Would you think a p-value of [0.05, 0.1] of C tests is acceptable? And shall I be worried if the p-value of overall Hansen test larger than 0.5? After adding a few new independent variables, it becomes 0.621. The number of instruments is 229 and observations is 339,855.

                            Code:
                            ------------------------------------------------------------------------------
                            Arellano-Bond test for AR(1) in first differences: z = -44.23  Pr > z =  0.000
                            Arellano-Bond test for AR(2) in first differences: z =  -1.23  Pr > z =  0.220
                            Arellano-Bond test for AR(3) in first differences: z =   1.34  Pr > z =  0.182
                            Arellano-Bond test for AR(4) in first differences: z =   0.56  Pr > z =  0.576
                            ------------------------------------------------------------------------------
                            Sargan test of overid. restrictions: chi2(191)  =1326.63  Prob > chi2 =  0.000
                              (Not robust, but not weakened by many instruments.)
                            Hansen test of overid. restrictions: chi2(191)  = 184.39  Prob > chi2 =  0.621
                              (Robust, but weakened by many instruments.)
                            
                            Difference-in-Hansen tests of exogeneity of instrument subsets:
                              GMM instruments for levels
                                Hansen test excluding group:     chi2(137)  = 124.81  Prob > chi2 =  0.764
                                Difference (null H = exogenous): chi2(54)   =  59.59  Prob > chi2 =  0.280
                              gmm(migrate, eq(level) lag(1 1))
                                Hansen test excluding group:     chi2(175)  = 171.65  Prob > chi2 =  0.557
                                Difference (null H = exogenous): chi2(16)   =  12.74  Prob > chi2 =  0.691
                              gmm(migrate, collapse eq(diff) lag(2 .))
                                Hansen test excluding group:     chi2(176)  = 168.64  Prob > chi2 =  0.641
                                Difference (null H = exogenous): chi2(15)   =  15.75  Prob > chi2 =  0.399
                              gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
                                Hansen test excluding group:     chi2(159)  = 154.63  Prob > chi2 =  0.583
                                Difference (null H = exogenous): chi2(32)   =  29.76  Prob > chi2 =  0.580
                              gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
                                Hansen test excluding group:     chi2(159)  = 159.06  Prob > chi2 =  0.484
                                Difference (null H = exogenous): chi2(32)   =  25.33  Prob > chi2 =  0.792
                              gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(level) lag(1 1))
                                Hansen test excluding group:     chi2(188)  = 182.40  Prob > chi2 =  0.602
                                Difference (null H = exogenous): chi2(3)    =   1.99  Prob > chi2 =  0.574
                              gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(diff) lag(2 .))
                                Hansen test excluding group:     chi2(141)  = 150.18  Prob > chi2 =  0.283
                                Difference (null H = exogenous): chi2(50)   =  34.21  Prob > chi2 =  0.957
                              gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(level) lag(0 0))
                                Hansen test excluding group:     chi2(188)  = 179.76  Prob > chi2 =  0.654
                                Difference (null H = exogenous): chi2(3)    =   4.64  Prob > chi2 =  0.200
                              gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
                                Hansen test excluding group:     chi2(141)  = 149.14  Prob > chi2 =  0.303
                                Difference (null H = exogenous): chi2(50)   =  35.25  Prob > chi2 =  0.943
                              iv(gap_highedu gap_med gap_theater, eq(level))
                                Hansen test excluding group:     chi2(188)  = 181.66  Prob > chi2 =  0.616
                                Difference (null H = exogenous): chi2(3)    =   2.73  Prob > chi2 =  0.435
                              iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13
                            > yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
                                Hansen test excluding group:     chi2(167)  = 164.20  Prob > chi2 =  0.547
                                Difference (null H = exogenous): chi2(24)   =  20.20  Prob > chi2 =  0.685
                            Last edited by Huaxin Wanglu; 15 Mar 2021, 12:13.

                            Comment


                            • #15
                              The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers: If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

                              There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.
                              https://twitter.com/Kripfganz

                              Comment

                              Working...
                              X