Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Specify interaction/square terms with xtabond2 & xtdpdgmm

    Dear all,

    First of all, I would like to confirm that I have searched and read many posts here but no extant solution could be found.

    I am now working with xtabond2 to conduct two-step sys-GMM estimation. I have read Roodman (2009) and Prof. Sebastian Kripfganz's presentation slides. But my case is a bit uncommon, so I still cannot figure out all the issues by exploring these materials.

    To clarify, I do not have a lagged dependent variable in the right-side equation. The reason I run GMM estimation is because for the purpose of robustness check, I have to address endogeneity while I cannot find proper external instrument variables.

    My observations in total are more than 600,000 with a time span of 22 years. My core predictor is a macro-level variable (i.e. yearly difference △Xt, △Xt-1, △Xt-2, etc.) and the dependent variable is a micro-level variable (i.e. individual choice). In my OLS & fixed-effect model, I find a U-shaped relationship (convexity), so I want to add the square term of my core predictor to the GMM estimation. But by specifying it as GMM-style instruments, the Hansen test is always significant (fairly below 0.25, just around 0.01 most of time). I tried all the positions it could be placed in, and have found that by treating it as exogenous and putting it in the IV-style instrument, I obtain statistically significant results and a decent Hansen test p-value (>0.40).

    1. My first confusion is, I treat the core predictor as endogenous, and put it in the GMM-style instrument with its second- and higher-orders (lag2-lag21). In this way, can I treat its square term as exogenous?

    2. Arellano-Bond test rejects the null until AR(6), is it still okay for me to include lags of 1-5 as instruments? Since I don't have lagged dependent variable in the model, so I am unsure whether Arellano-Bond test still applies to my case.

    3. From Prof. Sebastian Kripfganz's slides, I learn that dummy variables are usually treated as exogenous and put in the IV-style instrument with the level option, but how about the interaction term between endogenous / predetermined variables and dummies? If Hansen test and Difference-in-Hansen tests are all satisfied (fairly >0.25), is it justifiable to treat the interaction terms as exogenous?

    Lastly, I have run my specification with xtdpdgmm command before, but due to the number of my observations is quite large, I cannot obtain the result even after waiting for more than 30 minutes. Is there any way that I can speed up running xtdpdgmm?

    Hereby, I leave my codes:
    Code:
    xtabond2 migrate i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.L.gap_jobdiff3ex##c.L.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu i.yr2-yr22 , ///
    gmmstyle(gap_jobdiff3ex, lag(2 .) orthogonal collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy , lag(1 .) collapse) ///
    ivstyle(gap_highedu gap_med) ///
    ivstyle(c.L.gap_jobdiff3ex#c.L.gap_jobdiff3ex i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome i.yr2-yr22 , eq(level)) ///
    small twostep artests(6) cluster(dest_code)
    Note: i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome are time-invariant variables. I confirm that I realize that to include them, a stronger assumption is imposed on the estimation.

    Here is the test results:
    Code:
    ------------------------------------------------------------------------------
    Group variable: numeric_un~e                    Number of obs      =    670476
    Time variable : time                            Number of groups   =     57429
    Number of instruments = 94                      Obs per group: min =         1
    F(30, 272)    =    109.21                                      avg =     11.67
    Prob > F      =     0.000                                      max =        17
    ------------------------------------------------------------------------------
    
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -7.40  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =  -3.58  Pr > z =  0.000
    Arellano-Bond test for AR(3) in first differences: z =  -7.87  Pr > z =  0.000
    Arellano-Bond test for AR(4) in first differences: z =  -3.47  Pr > z =  0.001
    Arellano-Bond test for AR(5) in first differences: z =  -3.06  Pr > z =  0.002
    Arellano-Bond test for AR(6) in first differences: z =  -0.95  Pr > z =  0.342
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(63)   =89629.95 Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(63)   =  62.20  Prob > chi2 =  0.505
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(59)   =  59.02  Prob > chi2 =  0.475
        Difference (null H = exogenous): chi2(4)    =   3.18  Prob > chi2 =  0.528
      gmm(gap_jobdiff3ex, collapse orthogonal lag(2 .))
        Hansen test excluding group:     chi2(49)   =  52.77  Prob > chi2 =  0.331
        Difference (null H = exogenous): chi2(14)   =   9.43  Prob > chi2 =  0.802
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse lag(1 .))
        Hansen test excluding group:     chi2(10)   =  12.31  Prob > chi2 =  0.265
        Difference (null H = exogenous): chi2(53)   =  49.89  Prob > chi2 =  0.596
      iv(gap_highedu gap_med)
        Hansen test excluding group:     chi2(61)   =  60.85  Prob > chi2 =  0.481
        Difference (null H = exogenous): chi2(2)    =   1.35  Prob > chi2 =  0.509
      iv(cL.gap_jobdiff3ex#cL.gap_jobdiff3ex 0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome 0b.yr2 1.yr2 0b.yr3 1.yr3 0b.yr4 1.yr4 0b.yr5 1.yr5 0b.yr6 1.yr6 0b.yr7 1.yr7 0b.yr8 1.yr8 0b.yr9 1.yr9 0b.yr10 1.yr10 0b.yr 11 1.yr11 0b.yr12 1.yr12 0b.yr13 1.yr13 0b.yr14 1.yr14 0b.yr15 1.yr15 0b.yr16 1.yr16 0b.yr17 1.yr17 0b.yr18 1.yr18 0b.yr19 1.yr19 0b.yr20 1.yr20 0b.yr21 1.yr21 0b.yr22 1.yr22, eq(level))
        Hansen test excluding group:     chi2(39)   =  39.88  Prob > chi2 =  0.431
        Difference (null H = exogenous): chi2(24)   =  22.32  Prob > chi2 =  0.560
    Thanks for any comments!
    Last edited by Huaxin Wanglu; 05 Mar 2021, 18:51.

  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    That seems to be a matter about efficiency in the (implicit) first-stage regressions of the regressors on the instruments. These level instruments might be informative for some variables but less informative for others. Adding further informative instruments helps to improve the first-stage fit, while adding further uninformative (weak) instruments worsens the first-stage fit. Adding more (instrumental) variables is not always better, even in large samples.
    Thank you for the reply. It helps deepen my understanding. I will choose to report 2rd version in my paper.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    That seems to be a matter about efficiency in the (implicit) first-stage regressions of the regressors on the instruments. These level instruments might be informative for some variables but less informative for others. Adding further informative instruments helps to improve the first-stage fit, while adding further uninformative (weak) instruments worsens the first-stage fit. Adding more (instrumental) variables is not always better, even in large samples.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers:If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

    There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.
    Hello, truly sorry for raising up a question again. It's not very critical to my model, but I have been quite confused for a couple of days. In the specification of #14, I also collapse the instruments in the level model except for my core predictor. Because when I collapse all the instruments in both level and transformed models, my core predictor turns to be statistically insignificant. I understand that this is very likely because in large samples, collapsing worsens the statistical efficiency. However, when I switching from the combination of (a b c, lag (2 .) eq(diff) collapse) (a b c, lag (1 1) eq(level) collapse) towards (abc, lag (2 8) eq(diff) collapse) (a b c, lag (1 1) eq(level)), one variable changes from being statistically significant to insignificant and another variable becomes statistically significant. If the changes result from better statistical efficiency, then I think insignificant → significant is reasonable, but significant → insignificant sounds weird to me...

    I personally think that the 2rd version is better because it makes a better trade-off between statistical efficiency and too-many-instrument problem. And also I see you pointed out somewhere that collapsing specific instruments instead of all should be justified with a good reason. But I am not very confident in my understanding and so hope to learn your advice/opinions. Thanks a lot!

    If we just use second and third lags as instruments this leads to the pretty large total of 122 instruments. Using only the second order lags as instruments leads to just 64 instruments and much larger standard errors. When we collapse all the instruments in the standard way 76 instruments remain with results that differ substantially from those that simply skip higher-order lags from the full set of available instruments. Collapsing yields more insignificant regressors too.
    My old codes in #14:
    Code:
    xtabond2 migrate L.migrate a2003 c.co_age##c.co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu gap_theater gap_labprod gap_terti gap_LQ19 yr2-yr22, ///
    gmmstyle(migrate, lag(1 1) eq(level) collapse) /// predetermined
    gmmstyle(migrate, lag(2 .) eq(diff) collapse) ///
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(1 1) eq(level)) ///
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(1 1) eq(level) collapse) ///
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(0 0) eq(level) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(1 .) eq(diff) collapse) ///
    ivstyle(gap_highedu gap_med gap_theater, eq(level)) ///
    ivstyle(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
    small twostep artests(4) cluster(dest_code)
    New codes:
    Code:
    xtabond2 migrate L.migrate a2003 c.co_age##c.co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu gap_theater gap_labprod gap_terti gap_LQ19 yr2-yr22, ///
    gmmstyle(migrate, lag(1 1) eq(level)) /// predetermined
    gmmstyle(migrate, lag(2 8) eq(diff) collapse) ///
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(1 1) eq(level)) /// endogenous
    gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(2 8) eq(diff) collapse) ///
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(1 1) eq(level)) /// endogenous
    gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(2 8) eq(diff) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(0 0) eq(level)) /// not strictly exogenous
    gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(1 3) eq(diff) collapse) ///
    ivstyle(gap_highedu gap_med gap_theater, eq(level)) /// exogenous
    ivstyle(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
    small twostep artests(4) cluster(dest_code)
    Last edited by Huaxin Wanglu; 18 Mar 2021, 21:12.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers:If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

    There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.
    Thanks again for replying to my issues. I also compare the number of my instruments and observations with other articles. Comparisons suggest it is quite fine. I believe there is very low probability of instrument proliferation. And thanks for recommending the methodological paper. I will read it.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers: If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

    There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

    There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.
    Dear Prof. Sebastian Kripfganz, may I ask you a new question? Roodman(2009) mentions that a p-value of Hansen test as high as 0.25 should be viewed with concern. It implies a safe value falling into the range of 0.1 and 0.25 (if I understand correctly?). However, after comparing over a hundred times, I find that when the p-value is within this range, the C test (Difference-in-Hansen) usually cannot safely accept since at least one of the subset for either excluding group or including group would be smaller than 0.1 and sometimes, even <0.05. I am hesitated about how to make trade-off between them. Would you think a p-value of [0.05, 0.1] of C tests is acceptable? And shall I be worried if the p-value of overall Hansen test larger than 0.5? After adding a few new independent variables, it becomes 0.621. The number of instruments is 229 and observations is 339,855.

    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -44.23  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =  -1.23  Pr > z =  0.220
    Arellano-Bond test for AR(3) in first differences: z =   1.34  Pr > z =  0.182
    Arellano-Bond test for AR(4) in first differences: z =   0.56  Pr > z =  0.576
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(191)  =1326.63  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(191)  = 184.39  Prob > chi2 =  0.621
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(137)  = 124.81  Prob > chi2 =  0.764
        Difference (null H = exogenous): chi2(54)   =  59.59  Prob > chi2 =  0.280
      gmm(migrate, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(175)  = 171.65  Prob > chi2 =  0.557
        Difference (null H = exogenous): chi2(16)   =  12.74  Prob > chi2 =  0.691
      gmm(migrate, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(176)  = 168.64  Prob > chi2 =  0.641
        Difference (null H = exogenous): chi2(15)   =  15.75  Prob > chi2 =  0.399
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(159)  = 154.63  Prob > chi2 =  0.583
        Difference (null H = exogenous): chi2(32)   =  29.76  Prob > chi2 =  0.580
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(159)  = 159.06  Prob > chi2 =  0.484
        Difference (null H = exogenous): chi2(32)   =  25.33  Prob > chi2 =  0.792
      gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(level) lag(1 1))
        Hansen test excluding group:     chi2(188)  = 182.40  Prob > chi2 =  0.602
        Difference (null H = exogenous): chi2(3)    =   1.99  Prob > chi2 =  0.574
      gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(141)  = 150.18  Prob > chi2 =  0.283
        Difference (null H = exogenous): chi2(50)   =  34.21  Prob > chi2 =  0.957
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(level) lag(0 0))
        Hansen test excluding group:     chi2(188)  = 179.76  Prob > chi2 =  0.654
        Difference (null H = exogenous): chi2(3)    =   4.64  Prob > chi2 =  0.200
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(141)  = 149.14  Prob > chi2 =  0.303
        Difference (null H = exogenous): chi2(50)   =  35.25  Prob > chi2 =  0.943
      iv(gap_highedu gap_med gap_theater, eq(level))
        Hansen test excluding group:     chi2(188)  = 181.66  Prob > chi2 =  0.616
        Difference (null H = exogenous): chi2(3)    =   2.73  Prob > chi2 =  0.435
      iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13
    > yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(167)  = 164.20  Prob > chi2 =  0.547
        Difference (null H = exogenous): chi2(24)   =  20.20  Prob > chi2 =  0.685
    Last edited by Huaxin Wanglu; 15 Mar 2021, 12:13.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

    There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.
    Lots of thanks for the comment! You save me a lot of time. It is indeed succinct and informative.

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

    There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.
    Hello, may I ask you another question? To address reverse causality, I lag the variables with one period in OLS & FE, but when I use L.gap_jobdiff3ex instead of gap_jobdiff3ex in GMM, the p-values of Hansen test reduce to 0.10 and Difference-in-Hansen tests cannot fully pass. I guess this may because by lagging the variables, deeper lags suffer the weakened instruments problem. From a paper, I learn that GMM can tackle reverse causality without lagging. Because the paper my conceptual framework base on lag all the variables with one period in sys-GMM, so I am quite concerned with this option. Could you leave me some tips for if I should lag one period in GMM and how? By lagging one period, I also tried to add second lag of my dependent variable to the model. Since AR test accepts the null at AR(3), so I revise the codes to be as below but the coefficient of the L2 is negative.

    In principle, the Arellano-Bond (AB) estimator and related dynamic panel models offer a powerful toolbox to tackle endogeneity problems caused by both reverse causality and unobserved heterogeneity.
    We rely on the approach advocated by Arellano and Bond (1991) taking first differences in a first step to remove unobserved heterogeneity and then using second- and higher-order lags of the dependent variables as instruments in a standard GMM framework to deal with reverse causality.
    Code:
    gmmstyle(migrate, lag(2 2) eq(level)) ///
    gmmstyle(migrate, lag(3 .) eq(diff) collapse) ///

    Results with two lagged dependent variables:
    Code:
    -----------------------------------------------------------------------------------------------------
                                        |              Corrected
                                migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------------------+----------------------------------------------------------------
                                migrate |
                                    L1. |   1.159987   .0730317    15.88   0.000     1.016205    1.303768
                                    L2. |  -.1898163   .0698226    -2.72   0.007    -.3272801   -.0523526
                                        |
                                  a2003 |  -.0000582   .0001527    -0.38   0.703    -.0003588    .0002423
                                 co_age |  -.0003186   .0000309   -10.32   0.000    -.0003794   -.0002578
                           dy_schooling |   .0002668   .0000481     5.55   0.000     .0001722    .0003614
                               marriage |   -.004595   .0006148    -7.47   0.000    -.0058053   -.0033846
                             hukou_type |  -.0007282   .0004208    -1.73   0.085    -.0015566    .0001001
                                 a2025b |  -.0001572   .0001256    -1.25   0.212    -.0004046    .0000901
                               InIncome |   .0003588   .0001017     3.53   0.000     .0001587     .000559
                                        |
                         gap_jobdiff3ex |
                                    L1. |   .0000635   .0000702     0.91   0.366    -.0000747    .0002017
                                        |
    cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   1.67e-06   6.29e-07     2.65   0.009     4.28e-07    2.91e-06
                                        |
                              gap_ppden |
                                    L1. |   5.55e-06   2.44e-06     2.27   0.024     7.45e-07    .0000104
                                        |
                           gap_unemploy |
                                    L1. |  -.0308105   .1619523    -0.19   0.849     -.349655    .2880341
                                        |
                         gap_enterprise |
                                    L1. |   .0004517   .0003214     1.41   0.161    -.0001811    .0010844
                                        |
                                gap_med |
                                    L1. |   5.050586   1.232691     4.10   0.000     2.623718    7.477454
                                        |
                            gap_highedu |
                                    L1. |   .3269975   .0754531     4.33   0.000     .1784488    .4755461
    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z =  -8.77  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   2.73  Pr > z =  0.006
    Arellano-Bond test for AR(3) in first differences: z =  -0.04  Pr > z =  0.967
    Arellano-Bond test for AR(4) in first differences: z =  -0.59  Pr > z =  0.558
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(189)  =1776.06  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(189)  = 212.81  Prob > chi2 =  0.113
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(91)   =  94.47  Prob > chi2 =  0.381
        Difference (null H = exogenous): chi2(98)   = 118.34  Prob > chi2 =  0.079
      gmm(migrate, eq(level) lag(2 2))
        Hansen test excluding group:     chi2(173)  = 189.50  Prob > chi2 =  0.185
        Difference (null H = exogenous): chi2(16)   =  23.32  Prob > chi2 =  0.106
      gmm(migrate, collapse eq(diff) lag(3 .))
        Hansen test excluding group:     chi2(173)  = 191.34  Prob > chi2 =  0.161
        Difference (null H = exogenous): chi2(16)   =  21.47  Prob > chi2 =  0.161
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(157)  = 177.20  Prob > chi2 =  0.129
        Difference (null H = exogenous): chi2(32)   =  35.61  Prob > chi2 =  0.302
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(157)  = 169.57  Prob > chi2 =  0.233
        Difference (null H = exogenous): chi2(32)   =  43.25  Prob > chi2 =  0.089
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
        Hansen test excluding group:     chi2(139)  = 149.53  Prob > chi2 =  0.256
        Difference (null H = exogenous): chi2(50)   =  63.28  Prob > chi2 =  0.098
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(139)  = 173.36  Prob > chi2 =  0.026
        Difference (null H = exogenous): chi2(50)   =  39.45  Prob > chi2 =  0.858
      iv(L.gap_med L.gap_highedu, eq(level))
        Hansen test excluding group:     chi2(187)  = 210.15  Prob > chi2 =  0.118
        Difference (null H = exogenous): chi2(2)    =   2.66  Prob > chi2 =  0.264
      iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
    > r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(165)  = 182.91  Prob > chi2 =  0.161
        Difference (null H = exogenous): chi2(24)   =  29.90  Prob > chi2 =  0.188

    Results with only 1 lagged dependent variable:
    Code:
    -----------------------------------------------------------------------------------------------------
                                        |              Corrected
                                migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    ------------------------------------+----------------------------------------------------------------
                                migrate |
                                    L1. |   .9635485   .0053282   180.84   0.000     .9530589    .9740382
                                        |
                                  a2003 |  -.0001579   .0001957    -0.81   0.420    -.0005433    .0002274
                                 co_age |   -.000373   .0000264   -14.11   0.000     -.000425    -.000321
                           dy_schooling |   .0003459   .0000477     7.25   0.000      .000252    .0004398
                               marriage |  -.0047542   .0007491    -6.35   0.000    -.0062289   -.0032795
                             hukou_type |  -.0008414   .0005379    -1.56   0.119    -.0019003    .0002174
                                 a2025b |  -.0001694   .0001507    -1.12   0.262     -.000466    .0001272
                               InIncome |   .0004433   .0001295     3.42   0.001     .0001882    .0006983
                                        |
                         gap_jobdiff3ex |
                                    L1. |   .0000999   .0000775     1.29   0.198    -.0000526    .0002524
                                        |
    cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   2.01e-06   7.03e-07     2.86   0.005     6.28e-07    3.40e-06
                                        |
                              gap_ppden |
                                    L1. |   7.02e-06   2.77e-06     2.53   0.012     1.55e-06    .0000125
                                        |
                           gap_unemploy |
                                    L1. |  -.1215351    .169508    -0.72   0.474     -.455244    .2121739
                                        |
                         gap_enterprise |
                                    L1. |    .000753   .0003926     1.92   0.056      -.00002     .001526
                                        |
                                gap_med |
                                    L1. |   5.361344    1.37739     3.89   0.000     2.649688       8.073
                                        |
                            gap_highedu |
                                    L1. |   .4120449   .0989178     4.17   0.000     .2173063    .6067835
    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -59.78  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =   1.08  Pr > z =  0.282
    Arellano-Bond test for AR(3) in first differences: z =  -0.13  Pr > z =  0.894
    Arellano-Bond test for AR(4) in first differences: z =  -1.06  Pr > z =  0.289
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(191)  =2240.55  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(191)  = 215.61  Prob > chi2 =  0.107
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(93)   =  97.64  Prob > chi2 =  0.351
        Difference (null H = exogenous): chi2(98)   = 117.97  Prob > chi2 =  0.083
      gmm(migrate, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(175)  = 194.32  Prob > chi2 =  0.151
        Difference (null H = exogenous): chi2(16)   =  21.29  Prob > chi2 =  0.168
      gmm(migrate, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(174)  = 200.42  Prob > chi2 =  0.083
        Difference (null H = exogenous): chi2(17)   =  15.18  Prob > chi2 =  0.582
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(159)  = 172.30  Prob > chi2 =  0.223
        Difference (null H = exogenous): chi2(32)   =  43.31  Prob > chi2 =  0.088
      gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(159)  = 172.53  Prob > chi2 =  0.219
        Difference (null H = exogenous): chi2(32)   =  43.08  Prob > chi2 =  0.091
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
        Hansen test excluding group:     chi2(141)  = 172.45  Prob > chi2 =  0.037
        Difference (null H = exogenous): chi2(50)   =  43.16  Prob > chi2 =  0.742
      gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(141)  = 176.85  Prob > chi2 =  0.022
        Difference (null H = exogenous): chi2(50)   =  38.76  Prob > chi2 =  0.876
      iv(L.gap_med L.gap_highedu, eq(level))
        Hansen test excluding group:     chi2(189)  = 214.05  Prob > chi2 =  0.102
        Difference (null H = exogenous): chi2(2)    =   1.56  Prob > chi2 =  0.458
      iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
    > r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(167)  = 187.84  Prob > chi2 =  0.129
        Difference (null H = exogenous): chi2(24)   =  27.76  Prob > chi2 =  0.270

    Leszczensky, L., & Wolbring, T. (2019). How to Deal With Reverse Causality Using Panel Data? Recommendations for Researchers Based on a Simulation Study. Sociological Methods & Research. https://doi.org/10.1177/0049124119882473
    Last edited by Huaxin Wanglu; 10 Mar 2021, 15:23.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.
    Thanks a million. Your kind replies indeed help a lot!

    Leave a comment:


  • Sebastian Kripfganz
    replied
    Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Huaxin Wanglu View Post

    I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

    Codes 1:
    Code:
    xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)
    Codes 2:
    Code:
    xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)
    As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

    Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

    Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.
    Ah, I have figured out this difference by reading your old posts
    HTML Code:
    https://www.statalist.org/forums/forum/general-stata-discussion/general/1395858-xtdpdgmm-new-stata-command-for-efficient-gmm-estimation-of-linear-dynamic-panel-models-with-nonlinear-moment-conditions/page2
    And also, I have realized that my codes in xtabond2 are not completely equivalent since I did not collapse the differenced instruments for the level model (I prefer not).

    If possible, could you take a look at my codes posted in #7?
    Last edited by Huaxin Wanglu; 08 Mar 2021, 18:21.

    Leave a comment:


  • Huaxin Wanglu
    replied
    I update my new codes and results here. In this version, I include the lagged dependent variable.

    Code:
    xtabond2 migrate L.migrate a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
    c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu yr2-yr22, ///
    gmmstyle(migrate, lag(1 1) eq(level)) /// predetermined
    gmmstyle(migrate, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(1 1) eq(level)) //// endogenous
    gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(2 .) eq(diff) collapse) ///
    gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(0 0) eq(level)) /// predetermined
    gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(1 .) eq(diff) collapse) ///
    ivstyle(gap_med gap_highedu, eq(level)) /// exogenous
    ivstyle(i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
    small twostep artests(4) cluster(dest_code)
    Code:
    ------------------------------------------------------------------------------
    Arellano-Bond test for AR(1) in first differences: z = -57.35  Pr > z =  0.000
    Arellano-Bond test for AR(2) in first differences: z =  -0.97  Pr > z =  0.331
    Arellano-Bond test for AR(3) in first differences: z =  -0.03  Pr > z =  0.976
    Arellano-Bond test for AR(4) in first differences: z =   0.64  Pr > z =  0.521
    ------------------------------------------------------------------------------
    Sargan test of overid. restrictions: chi2(190)  =2327.26  Prob > chi2 =  0.000
      (Not robust, but not weakened by many instruments.)
    Hansen test of overid. restrictions: chi2(190)  = 196.63  Prob > chi2 =  0.356
      (Robust, but weakened by many instruments.)
    
    Difference-in-Hansen tests of exogeneity of instrument subsets:
      GMM instruments for levels
        Hansen test excluding group:     chi2(92)   = 103.26  Prob > chi2 =  0.198
        Difference (null H = exogenous): chi2(98)   =  93.37  Prob > chi2 =  0.613
      gmm(migrate, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(174)  = 187.76  Prob > chi2 =  0.225
        Difference (null H = exogenous): chi2(16)   =   8.87  Prob > chi2 =  0.919
      gmm(migrate, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(174)  = 190.77  Prob > chi2 =  0.182
        Difference (null H = exogenous): chi2(16)   =   5.86  Prob > chi2 =  0.990
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
        Hansen test excluding group:     chi2(158)  = 164.21  Prob > chi2 =  0.351
        Difference (null H = exogenous): chi2(32)   =  32.43  Prob > chi2 =  0.446
      gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
        Hansen test excluding group:     chi2(158)  = 178.05  Prob > chi2 =  0.131
        Difference (null H = exogenous): chi2(32)   =  18.58  Prob > chi2 =  0.972
      gmm(gap_ppden gap_enterprise gap_unemploy, eq(level) lag(0 0))
        Hansen test excluding group:     chi2(140)  = 155.43  Prob > chi2 =  0.176
        Difference (null H = exogenous): chi2(50)   =  41.20  Prob > chi2 =  0.808
      gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
        Hansen test excluding group:     chi2(140)  = 159.45  Prob > chi2 =  0.125
        Difference (null H = exogenous): chi2(50)   =  37.18  Prob > chi2 =  0.910
      iv(gap_med gap_highedu, eq(level))
        Hansen test excluding group:     chi2(188)  = 195.96  Prob > chi2 =  0.330
        Difference (null H = exogenous): chi2(2)    =   0.67  Prob > chi2 =  0.715
      iv(0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11
    > yr12 yr13 yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
        Hansen test excluding group:     chi2(166)  = 183.18  Prob > chi2 =  0.171
        Difference (null H = exogenous): chi2(24)   =  13.45  Prob > chi2 =  0.958
    Last edited by Huaxin Wanglu; 08 Mar 2021, 18:23.

    Leave a comment:


  • Huaxin Wanglu
    replied
    Originally posted by Sebastian Kripfganz View Post
    1. If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.
    2. If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.
    3. Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.
    4. I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.
    I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

    Codes 1:
    Code:
    xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)
    Codes 2:
    Code:
    xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)
    As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

    Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

    Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.
    Last edited by Huaxin Wanglu; 08 Mar 2021, 15:47.

    Leave a comment:

Working...
X