Dear all,

First of all, I would like to confirm that I have searched and read many posts here but no extant solution could be found.

I am now working with xtabond2 to conduct two-step sys-GMM estimation. I have read Roodman (2009) and Prof. Sebastian Kripfganz's presentation slides. But my case is a bit uncommon, so I still cannot figure out all the issues by exploring these materials.

To clarify,

My observations in total are more than 600,000 with a time span of 22 years. My core predictor is a macro-level variable (i.e. yearly difference △Xt, △Xt-1, △Xt-2, etc.) and the dependent variable is a micro-level variable (i.e. individual choice). In my OLS & fixed-effect model, I find a U-shaped relationship (convexity), so I want to add the square term of my core predictor to the GMM estimation. But by specifying it as GMM-style instruments, the Hansen test is always significant (fairly below 0.25, just around 0.01 most of time). I tried all the positions it could be placed in, and have found that

1. My first confusion is, I treat the core predictor as endogenous, and put it in the GMM-style instrument with its second- and higher-orders (lag2-lag21). In this way, can I treat its square term as exogenous?

2. Arellano-Bond test rejects the null until AR(6), is it still okay for me to include lags of 1-5 as instruments? Since I don't have lagged dependent variable in the model, so I am unsure whether Arellano-Bond test still applies to my case.

3. From Prof. Sebastian Kripfganz's slides, I learn that dummy variables are usually treated as exogenous and put in the IV-style instrument with the level option, but how about the interaction term between endogenous / predetermined variables and dummies? If Hansen test and Difference-in-Hansen tests are all satisfied (fairly >0.25), is it justifiable to treat the interaction terms as exogenous?

Lastly, I have run my specification with xtdpdgmm command before, but due to the number of my observations is quite large, I cannot obtain the result even after waiting for more than 30 minutes. Is there any way that I can speed up running xtdpdgmm?

Hereby, I leave my codes:

Note: i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome are time-invariant variables. I confirm that I realize that to include them, a stronger assumption is imposed on the estimation.

Here is the test results:

Thanks for any comments!

First of all, I would like to confirm that I have searched and read many posts here but no extant solution could be found.

I am now working with xtabond2 to conduct two-step sys-GMM estimation. I have read Roodman (2009) and Prof. Sebastian Kripfganz's presentation slides. But my case is a bit uncommon, so I still cannot figure out all the issues by exploring these materials.

To clarify,

**I do not have a lagged dependent variable**in the right-side equation. The reason I run GMM estimation is because for the purpose of robustness check, I have to address endogeneity while I cannot find proper external instrument variables.My observations in total are more than 600,000 with a time span of 22 years. My core predictor is a macro-level variable (i.e. yearly difference △Xt, △Xt-1, △Xt-2, etc.) and the dependent variable is a micro-level variable (i.e. individual choice). In my OLS & fixed-effect model, I find a U-shaped relationship (convexity), so I want to add the square term of my core predictor to the GMM estimation. But by specifying it as GMM-style instruments, the Hansen test is always significant (fairly below 0.25, just around 0.01 most of time). I tried all the positions it could be placed in, and have found that

**by treating it as exogenous and putting it in the IV-style instrument, I obtain statistically significant results and a decent Hansen test p-value (>0.40).**1. My first confusion is, I treat the core predictor as endogenous, and put it in the GMM-style instrument with its second- and higher-orders (lag2-lag21). In this way, can I treat its square term as exogenous?

2. Arellano-Bond test rejects the null until AR(6), is it still okay for me to include lags of 1-5 as instruments? Since I don't have lagged dependent variable in the model, so I am unsure whether Arellano-Bond test still applies to my case.

3. From Prof. Sebastian Kripfganz's slides, I learn that dummy variables are usually treated as exogenous and put in the IV-style instrument with the level option, but how about the interaction term between endogenous / predetermined variables and dummies? If Hansen test and Difference-in-Hansen tests are all satisfied (fairly >0.25), is it justifiable to treat the interaction terms as exogenous?

Lastly, I have run my specification with xtdpdgmm command before, but due to the number of my observations is quite large, I cannot obtain the result even after waiting for more than 30 minutes. Is there any way that I can speed up running xtdpdgmm?

Hereby, I leave my codes:

Code:

xtabond2 migrate i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome /// c.L.gap_jobdiff3ex##c.L.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu i.yr2-yr22 , /// gmmstyle(gap_jobdiff3ex, lag(2 .) orthogonal collapse) /// gmmstyle(gap_ppden gap_enterprise gap_unemploy , lag(1 .) collapse) /// ivstyle(gap_highedu gap_med) /// ivstyle(c.L.gap_jobdiff3ex#c.L.gap_jobdiff3ex i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome i.yr2-yr22 , eq(level)) /// small twostep artests(6) cluster(dest_code)

Here is the test results:

Code:

------------------------------------------------------------------------------ Group variable: numeric_un~e Number of obs = 670476 Time variable : time Number of groups = 57429 Number of instruments = 94 Obs per group: min = 1 F(30, 272) = 109.21 avg = 11.67 Prob > F = 0.000 max = 17 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -7.40 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = -3.58 Pr > z = 0.000 Arellano-Bond test for AR(3) in first differences: z = -7.87 Pr > z = 0.000 Arellano-Bond test for AR(4) in first differences: z = -3.47 Pr > z = 0.001 Arellano-Bond test for AR(5) in first differences: z = -3.06 Pr > z = 0.002 Arellano-Bond test for AR(6) in first differences: z = -0.95 Pr > z = 0.342 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(63) =89629.95 Prob > chi2 = 0.000 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(63) = 62.20 Prob > chi2 = 0.505 (Robust, but weakened by many instruments.) Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(59) = 59.02 Prob > chi2 = 0.475 Difference (null H = exogenous): chi2(4) = 3.18 Prob > chi2 = 0.528 gmm(gap_jobdiff3ex, collapse orthogonal lag(2 .)) Hansen test excluding group: chi2(49) = 52.77 Prob > chi2 = 0.331 Difference (null H = exogenous): chi2(14) = 9.43 Prob > chi2 = 0.802 gmm(gap_ppden gap_enterprise gap_unemploy, collapse lag(1 .)) Hansen test excluding group: chi2(10) = 12.31 Prob > chi2 = 0.265 Difference (null H = exogenous): chi2(53) = 49.89 Prob > chi2 = 0.596 iv(gap_highedu gap_med) Hansen test excluding group: chi2(61) = 60.85 Prob > chi2 = 0.481 Difference (null H = exogenous): chi2(2) = 1.35 Prob > chi2 = 0.509 iv(cL.gap_jobdiff3ex#cL.gap_jobdiff3ex 0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome 0b.yr2 1.yr2 0b.yr3 1.yr3 0b.yr4 1.yr4 0b.yr5 1.yr5 0b.yr6 1.yr6 0b.yr7 1.yr7 0b.yr8 1.yr8 0b.yr9 1.yr9 0b.yr10 1.yr10 0b.yr 11 1.yr11 0b.yr12 1.yr12 0b.yr13 1.yr13 0b.yr14 1.yr14 0b.yr15 1.yr15 0b.yr16 1.yr16 0b.yr17 1.yr17 0b.yr18 1.yr18 0b.yr19 1.yr19 0b.yr20 1.yr20 0b.yr21 1.yr21 0b.yr22 1.yr22, eq(level)) Hansen test excluding group: chi2(39) = 39.88 Prob > chi2 = 0.431 Difference (null H = exogenous): chi2(24) = 22.32 Prob > chi2 = 0.560

## Comment