Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sebastian Kripfganz
    replied
    That seems to be a matter about efficiency in the (implicit) first-stage regressions of the regressors on the instruments. These level instruments might be informative for some variables but less informative for others. Adding further informative instruments helps to improve the first-stage fit, while adding further uninformative (weak) instruments worsens the first-stage fit. Adding more (instrumental) variables is not always better, even in large samples.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers:If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

    There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.
    Hello, truly sorry for raising up a question again. It's not very critical to my model, but I have been quite confused for a couple of days. In the specification of #14, I also collapse the instruments in the level model except for my core predictor. Because when I collapse all the instruments in both level and transformed models, my core predictor turns to be statistically insignificant. I understand that this is very likely because in large samples, collapsing worsens the statistical efficiency. However, when I switching from the combination of (a b c, lag (2 .) eq(diff) collapse) (a b c, lag (1 1) eq(level) collapse) towards (abc, lag (2 8) eq(diff) collapse) (a b c, lag (1 1) eq(level)), one variable changes from being statistically significant to insignificant and another variable becomes statistically significant. If the changes result from better statistical efficiency, then I think insignificant → significant is reasonable, but significant → insignificant sounds weird to me...

    I personally think that the 2rd version is better because it makes a better trade-off between statistical efficiency and too-many-instrument problem. And also I see you pointed out somewhere that collapsing specific instruments instead of all should be justified with a good reason. But I am not very confident in my understanding and so hope to learn your advice/opinions. Thanks a lot!

    If we just use second and third lags as instruments this leads to the pretty large total of 122 instruments. Using only the second order lags as instruments leads to just 64 instruments and much larger standard errors. When we collapse all the instruments in the standard way 76 instruments remain with results that differ substantially from those that simply skip higher-order lags from the full set of available instruments. Collapsing yields more insignificant regressors too.
    My old codes in #14:
    Code:
    xtabond2 migrate L.migrate a2003 c.co_age##c.co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu gap_theater gap_labprod gap_terti gap_LQ19 yr2-yr22, ///
    gmmstyle(migrate, lag(1 1) eq(level) collapse) /// predetermined
    gmmstyle(migrate, lag(2 .) eq(diff) collapse) ///
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(1 1) eq(level)) ///
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(1 1) eq(level) collapse) ///
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(0 0) eq(level) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(1 .) eq(diff) collapse) ///
    ivstyle(gap_highedu gap_med gap_theater, eq(level)) ///
    ivstyle(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
    small twostep artests(4) cluster(dest_code)
    New codes:
    Code:
    xtabond2 migrate L.migrate a2003 c.co_age##c.co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu gap_theater gap_labprod gap_terti gap_LQ19 yr2-yr22, ///
    gmmstyle(migrate, lag(1 1) eq(level)) /// predetermined
    gmmstyle(migrate, lag(2 8) eq(diff) collapse) ///
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(1 1) eq(level)) /// endogenous
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(2 8) eq(diff) collapse) ///
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(1 1) eq(level)) /// endogenous
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(2 8) eq(diff) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(0 0) eq(level)) /// not strictly exogenous
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(1 3) eq(diff) collapse) ///
    ivstyle(gap_highedu gap_med gap_theater, eq(level)) /// exogenous
    ivstyle(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
    small twostep artests(4) cluster(dest_code)
    Last edited by Huaxin Wanglu; 18 Mar 2021, 21:12.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers:If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

    There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.
    Thanks again for replying to my issues. I also compare the number of my instruments and observations with other articles. Comparisons suggest it is quite fine. I believe there is very low probability of instrument proliferation. And thanks for recommending the methodological paper. I will read it.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers: If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

    There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

    There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.
    Dear Prof. Sebastian Kripfganz, may I ask you a new question? Roodman(2009) mentions that a p-value of Hansen test as high as 0.25 should be viewed with concern. It implies a safe value falling into the range of 0.1 and 0.25 (if I understand correctly?). However, after comparing over a hundred times, I find that when the p-value is within this range, the C test (Difference-in-Hansen) usually cannot safely accept since at least one of the subset for either excluding group or including group would be smaller than 0.1 and sometimes, even <0.05. I am hesitated about how to make trade-off between them. Would you think a p-value of [0.05, 0.1] of C tests is acceptable? And shall I be worried if the p-value of overall Hansen test larger than 0.5? After adding a few new independent variables, it becomes 0.621. The number of instruments is 229 and observations is 339,855.

    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -44.23  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =  -1.23  Pr > z =  0.220
    Arellano-Bond test for AR(3) in first differences: z =   1.34  Pr > z =  0.182
    Arellano-Bond test for AR(4) in first differences: z =   0.56  Pr > z =  0.576
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(191)  =1326.63  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(191)  = 184.39  Prob > chi2 =  0.621
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(137)  = 124.81  Prob > chi2 =  0.764
        Difference (null H = exogenous): chi2(54)   =  59.59  Prob > chi2 =  0.280
      gmm(migrate, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(175)  = 171.65  Prob > chi2 =  0.557
        Difference (null H = exogenous): chi2(16)   =  12.74  Prob > chi2 =  0.691
      gmm(migrate, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(176)  = 168.64  Prob > chi2 =  0.641
        Difference (null H = exogenous): chi2(15)   =  15.75  Prob > chi2 =  0.399
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(159)  = 154.63  Prob > chi2 =  0.583
        Difference (null H = exogenous): chi2(32)   =  29.76  Prob > chi2 =  0.580
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(159)  = 159.06  Prob > chi2 =  0.484
        Difference (null H = exogenous): chi2(32)   =  25.33  Prob > chi2 =  0.792
      gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(level) lag(1 1))
        Hansen test excluding group:     chi2(188)  = 182.40  Prob > chi2 =  0.602
        Difference (null H = exogenous): chi2(3)    =   1.99  Prob > chi2 =  0.574
      gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(141)  = 150.18  Prob > chi2 =  0.283
        Difference (null H = exogenous): chi2(50)   =  34.21  Prob > chi2 =  0.957
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(level) lag(0 0))
        Hansen test excluding group:     chi2(188)  = 179.76  Prob > chi2 =  0.654
        Difference (null H = exogenous): chi2(3)    =   4.64  Prob > chi2 =  0.200
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(141)  = 149.14  Prob > chi2 =  0.303
        Difference (null H = exogenous): chi2(50)   =  35.25  Prob > chi2 =  0.943
      iv(gap_highedu gap_med gap_theater, eq(level))
        Hansen test excluding group:     chi2(188)  = 181.66  Prob > chi2 =  0.616
        Difference (null H = exogenous): chi2(3)    =   2.73  Prob > chi2 =  0.435
      iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13
    > yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(167)  = 164.20  Prob > chi2 =  0.547
        Difference (null H = exogenous): chi2(24)   =  20.20  Prob > chi2 =  0.685
    Last edited by Huaxin Wanglu; 15 Mar 2021, 12:13.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

    There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.
    Lots of thanks for the comment! You save me a lot of time. It is indeed succinct and informative.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

    There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.
    Hello, may I ask you another question? To address reverse causality, I lag the variables with one period in OLS & FE, but when I use L.gap_jobdiff3ex instead of gap_jobdiff3ex in GMM, the p-values of Hansen test reduce to 0.10 and Difference-in-Hansen tests cannot fully pass. I guess this may because by lagging the variables, deeper lags suffer the weakened instruments problem. From a paper, I learn that GMM can tackle reverse causality without lagging. Because the paper my conceptual framework base on lag all the variables with one period in sys-GMM, so I am quite concerned with this option. Could you leave me some tips for if I should lag one period in GMM and how? By lagging one period, I also tried to add second lag of my dependent variable to the model. Since AR test accepts the null at AR(3), so I revise the codes to be as below but the coefficient of the L2 is negative.

    In principle, the Arellano-Bond (AB) estimator and related dynamic panel models offer a powerful toolbox to tackle endogeneity problems caused by both reverse causality and unobserved heterogeneity.
    We rely on the approach advocated by Arellano and Bond (1991) taking first differences in a first step to remove unobserved heterogeneity and then using second- and higher-order lags of the dependent variables as instruments in a standard GMM framework to deal with reverse causality.
    Code:
    gmmstyle(migrate, lag(2 2) eq(level)) ///
    gmmstyle(migrate, lag(3 .) eq(diff) collapse) ///

    Results with two lagged dependent variables:
    Code:
    -----------------------------------------------------------------------------------------------------
                                        |              Corrected
                                migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------------------+----------------------------------------------------------------
                                migrate |
                                    L1. |   1.159987   .0730317    15.88   0.000     1.016205    1.303768
                                    L2. |  -.1898163   .0698226    -2.72   0.007    -.3272801   -.0523526
                                        |
                                  a2003 |  -.0000582   .0001527    -0.38   0.703    -.0003588    .0002423
                                 co_age |  -.0003186   .0000309   -10.32   0.000    -.0003794   -.0002578
                           dy_schooling |   .0002668   .0000481     5.55   0.000     .0001722    .0003614
                               marriage |   -.004595   .0006148    -7.47   0.000    -.0058053   -.0033846
                             hukou_type |  -.0007282   .0004208    -1.73   0.085    -.0015566    .0001001
                                 a2025b |  -.0001572   .0001256    -1.25   0.212    -.0004046    .0000901
                               InIncome |   .0003588   .0001017     3.53   0.000     .0001587     .000559
                                        |
                         gap_jobdiff3ex |
                                    L1. |   .0000635   .0000702     0.91   0.366    -.0000747    .0002017
                                        |
    cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   1.67e-06   6.29e-07     2.65   0.009     4.28e-07    2.91e-06
                                        |
                              gap_ppden |
                                    L1. |   5.55e-06   2.44e-06     2.27   0.024     7.45e-07    .0000104
                                        |
                           gap_unemploy |
                                    L1. |  -.0308105   .1619523    -0.19   0.849     -.349655    .2880341
                                        |
                         gap_enterprise |
                                    L1. |   .0004517   .0003214     1.41   0.161    -.0001811    .0010844
                                        |
                                gap_med |
                                    L1. |   5.050586   1.232691     4.10   0.000     2.623718    7.477454
                                        |
                            gap_highedu |
                                    L1. |   .3269975   .0754531     4.33   0.000     .1784488    .4755461
    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -8.77  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   2.73  Pr > z =  0.006
    Arellano-Bond test for AR(3) in first differences: z =  -0.04  Pr > z =  0.967
    Arellano-Bond test for AR(4) in first differences: z =  -0.59  Pr > z =  0.558
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(189)  =1776.06  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(189)  = 212.81  Prob > chi2 =  0.113
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(91)   =  94.47  Prob > chi2 =  0.381
        Difference (null H = exogenous): chi2(98)   = 118.34  Prob > chi2 =  0.079
      gmm(migrate, eq(level) lag(2 2))
        Hansen test excluding group:     chi2(173)  = 189.50  Prob > chi2 =  0.185
        Difference (null H = exogenous): chi2(16)   =  23.32  Prob > chi2 =  0.106
      gmm(migrate, collapse eq(diff) lag(3 .))
        Hansen test excluding group:     chi2(173)  = 191.34  Prob > chi2 =  0.161
        Difference (null H = exogenous): chi2(16)   =  21.47  Prob > chi2 =  0.161
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(157)  = 177.20  Prob > chi2 =  0.129
        Difference (null H = exogenous): chi2(32)   =  35.61  Prob > chi2 =  0.302
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(157)  = 169.57  Prob > chi2 =  0.233
        Difference (null H = exogenous): chi2(32)   =  43.25  Prob > chi2 =  0.089
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
        Hansen test excluding group:     chi2(139)  = 149.53  Prob > chi2 =  0.256
        Difference (null H = exogenous): chi2(50)   =  63.28  Prob > chi2 =  0.098
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(139)  = 173.36  Prob > chi2 =  0.026
        Difference (null H = exogenous): chi2(50)   =  39.45  Prob > chi2 =  0.858
      iv(L.gap_med L.gap_highedu, eq(level))
        Hansen test excluding group:     chi2(187)  = 210.15  Prob > chi2 =  0.118
        Difference (null H = exogenous): chi2(2)    =   2.66  Prob > chi2 =  0.264
      iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
    > r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(165)  = 182.91  Prob > chi2 =  0.161
        Difference (null H = exogenous): chi2(24)   =  29.90  Prob > chi2 =  0.188

    Results with only 1 lagged dependent variable:
    Code:
    -----------------------------------------------------------------------------------------------------
                                        |              Corrected
                                migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------------------+----------------------------------------------------------------
                                migrate |
                                    L1. |   .9635485   .0053282   180.84   0.000     .9530589    .9740382
                                        |
                                  a2003 |  -.0001579   .0001957    -0.81   0.420    -.0005433    .0002274
                                 co_age |   -.000373   .0000264   -14.11   0.000     -.000425    -.000321
                           dy_schooling |   .0003459   .0000477     7.25   0.000      .000252    .0004398
                               marriage |  -.0047542   .0007491    -6.35   0.000    -.0062289   -.0032795
                             hukou_type |  -.0008414   .0005379    -1.56   0.119    -.0019003    .0002174
                                 a2025b |  -.0001694   .0001507    -1.12   0.262     -.000466    .0001272
                               InIncome |   .0004433   .0001295     3.42   0.001     .0001882    .0006983
                                        |
                         gap_jobdiff3ex |
                                    L1. |   .0000999   .0000775     1.29   0.198    -.0000526    .0002524
                                        |
    cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   2.01e-06   7.03e-07     2.86   0.005     6.28e-07    3.40e-06
                                        |
                              gap_ppden |
                                    L1. |   7.02e-06   2.77e-06     2.53   0.012     1.55e-06    .0000125
                                        |
                           gap_unemploy |
                                    L1. |  -.1215351    .169508    -0.72   0.474     -.455244    .2121739
                                        |
                         gap_enterprise |
                                    L1. |    .000753   .0003926     1.92   0.056      -.00002     .001526
                                        |
                                gap_med |
                                    L1. |   5.361344    1.37739     3.89   0.000     2.649688       8.073
                                        |
                            gap_highedu |
                                    L1. |   .4120449   .0989178     4.17   0.000     .2173063    .6067835
    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -59.78  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   1.08  Pr > z =  0.282
    Arellano-Bond test for AR(3) in first differences: z =  -0.13  Pr > z =  0.894
    Arellano-Bond test for AR(4) in first differences: z =  -1.06  Pr > z =  0.289
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(191)  =2240.55  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(191)  = 215.61  Prob > chi2 =  0.107
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(93)   =  97.64  Prob > chi2 =  0.351
        Difference (null H = exogenous): chi2(98)   = 117.97  Prob > chi2 =  0.083
      gmm(migrate, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(175)  = 194.32  Prob > chi2 =  0.151
        Difference (null H = exogenous): chi2(16)   =  21.29  Prob > chi2 =  0.168
      gmm(migrate, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(174)  = 200.42  Prob > chi2 =  0.083
        Difference (null H = exogenous): chi2(17)   =  15.18  Prob > chi2 =  0.582
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(159)  = 172.30  Prob > chi2 =  0.223
        Difference (null H = exogenous): chi2(32)   =  43.31  Prob > chi2 =  0.088
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(159)  = 172.53  Prob > chi2 =  0.219
        Difference (null H = exogenous): chi2(32)   =  43.08  Prob > chi2 =  0.091
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
        Hansen test excluding group:     chi2(141)  = 172.45  Prob > chi2 =  0.037
        Difference (null H = exogenous): chi2(50)   =  43.16  Prob > chi2 =  0.742
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(141)  = 176.85  Prob > chi2 =  0.022
        Difference (null H = exogenous): chi2(50)   =  38.76  Prob > chi2 =  0.876
      iv(L.gap_med L.gap_highedu, eq(level))
        Hansen test excluding group:     chi2(189)  = 214.05  Prob > chi2 =  0.102
        Difference (null H = exogenous): chi2(2)    =   1.56  Prob > chi2 =  0.458
      iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
    > r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(167)  = 187.84  Prob > chi2 =  0.129
        Difference (null H = exogenous): chi2(24)   =  27.76  Prob > chi2 =  0.270

    Leszczensky, L., & Wolbring, T. (2019). How to Deal With Reverse Causality Using Panel Data? Recommendations for Researchers Based on a Simulation Study. Sociological Methods & Research. https://doi.org/10.1177/0049124119882473
    Last edited by Huaxin Wanglu; 10 Mar 2021, 15:23.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.
    Thanks a million. Your kind replies indeed help a lot!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Huaxin Wanglu View Post

    I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

    Codes 1:
    Code:
    xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)
    Codes 2:
    Code:
    xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)
    As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

    Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

    Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.
    Ah, I have figured out this difference by reading your old posts
    HTML Code:
    https://www.statalist.org/forums/forum/general-stata-discussion/general/1395858-xtdpdgmm-new-stata-command-for-efficient-gmm-estimation-of-linear-dynamic-panel-models-with-nonlinear-moment-conditions/page2
    And also, I have realized that my codes in xtabond2 are not completely equivalent since I did not collapse the differenced instruments for the level model (I prefer not).

    If possible, could you take a look at my codes posted in #7?
    Last edited by Huaxin Wanglu; 08 Mar 2021, 18:21.

    Leave a comment:


  • Huaxin Wanglu
    replied
    I update my new codes and results here. In this version, I include the lagged dependent variable.

    Code:
    xtabond2 migrate L.migrate a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu yr2-yr22, ///
    gmmstyle(migrate, lag(1 1) eq(level)) /// predetermined
    gmmstyle(migrate, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(1 1) eq(level)) //// endogenous
    gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(0 0) eq(level)) /// predetermined
    gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(1 .) eq(diff) collapse) ///
    ivstyle(gap_med gap_highedu, eq(level)) /// exogenous
    ivstyle(i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
    small twostep artests(4) cluster(dest_code)
    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -57.35  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =  -0.97  Pr > z =  0.331
    Arellano-Bond test for AR(3) in first differences: z =  -0.03  Pr > z =  0.976
    Arellano-Bond test for AR(4) in first differences: z =   0.64  Pr > z =  0.521
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(190)  =2327.26  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(190)  = 196.63  Prob > chi2 =  0.356
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(92)   = 103.26  Prob > chi2 =  0.198
        Difference (null H = exogenous): chi2(98)   =  93.37  Prob > chi2 =  0.613
      gmm(migrate, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(174)  = 187.76  Prob > chi2 =  0.225
        Difference (null H = exogenous): chi2(16)   =   8.87  Prob > chi2 =  0.919
      gmm(migrate, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(174)  = 190.77  Prob > chi2 =  0.182
        Difference (null H = exogenous): chi2(16)   =   5.86  Prob > chi2 =  0.990
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(158)  = 164.21  Prob > chi2 =  0.351
        Difference (null H = exogenous): chi2(32)   =  32.43  Prob > chi2 =  0.446
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(158)  = 178.05  Prob > chi2 =  0.131
        Difference (null H = exogenous): chi2(32)   =  18.58  Prob > chi2 =  0.972
      gmm(gap_ppden gap_enterprise gap_unemploy, eq(level) lag(0 0))
        Hansen test excluding group:     chi2(140)  = 155.43  Prob > chi2 =  0.176
        Difference (null H = exogenous): chi2(50)   =  41.20  Prob > chi2 =  0.808
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(140)  = 159.45  Prob > chi2 =  0.125
        Difference (null H = exogenous): chi2(50)   =  37.18  Prob > chi2 =  0.910
      iv(gap_med gap_highedu, eq(level))
        Hansen test excluding group:     chi2(188)  = 195.96  Prob > chi2 =  0.330
        Difference (null H = exogenous): chi2(2)    =   0.67  Prob > chi2 =  0.715
      iv(0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11
    > yr12 yr13 yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(166)  = 183.18  Prob > chi2 =  0.171
        Difference (null H = exogenous): chi2(24)   =  13.45  Prob > chi2 =  0.958
    Last edited by Huaxin Wanglu; 08 Mar 2021, 18:23.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
    2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
    3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
    4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
    I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

    Codes 1:
    Code:
    xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)
    Codes 2:
    Code:
    xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)
    As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

    Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

    Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.
    Last edited by Huaxin Wanglu; 08 Mar 2021, 15:47.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
    2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
    3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
    4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
    Sorry for two more questions:
    1. Actually I don't want to run my endogenous variable in the diff-model since it is per se a differenced variable. If I specify the lags of level and lags of difference as instruments manually, will serial correlation still invalidate the instrument?
    2. My dependent variable is binary. I remember you have mentioned somewhere that it may have problems with mean stationarity assumption, so I had better not include it if it's not theoretically necessary?

    Such as:
    Code:
    gmmstyle(gap_jobdiff19 , lag(2 .) eq(level) collapse) ///
    gmmstyle(D.gap_jobdiff19 , lag(1 .) eq(level) collapse) ///
    By including the lagged dependent variable, my specification passes AR(2), but I am afraid that as it is binary, I cannot treat it as common lagged dependent variable.
    Last edited by Huaxin Wanglu; 08 Mar 2021, 12:11.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
    2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
    3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
    4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
    Thanks a lot for the comments. For point 2, does it mean if my specification accepts the null hypothesis at AR(6), I have to use lag5-21 instead of lag2-21 as the instruments?

    Leave a comment:

Working...
X