Specify interaction/square terms with xtabond2 & xtdpdgmm

Huaxin Wanglu

Join Date: Mar 2021
Posts: 33

Specify interaction/square terms with xtabond2 & xtdpdgmm

05 Mar 2021, 17:51

Dear all,

First of all, I would like to confirm that I have searched and read many posts here but no extant solution could be found.

I am now working with xtabond2 to conduct two-step sys-GMM estimation. I have read Roodman (2009) and Prof. Sebastian Kripfganz's presentation slides. But my case is a bit uncommon, so I still cannot figure out all the issues by exploring these materials.

To clarify, I do not have a lagged dependent variable in the right-side equation. The reason I run GMM estimation is because for the purpose of robustness check, I have to address endogeneity while I cannot find proper external instrument variables.

My observations in total are more than 600,000 with a time span of 22 years. My core predictor is a macro-level variable (i.e. yearly difference △Xt, △Xt-1, △Xt-2, etc.) and the dependent variable is a micro-level variable (i.e. individual choice). In my OLS & fixed-effect model, I find a U-shaped relationship (convexity), so I want to add the square term of my core predictor to the GMM estimation. But by specifying it as GMM-style instruments, the Hansen test is always significant (fairly below 0.25, just around 0.01 most of time). I tried all the positions it could be placed in, and have found that by treating it as exogenous and putting it in the IV-style instrument, I obtain statistically significant results and a decent Hansen test p-value (>0.40).

1. My first confusion is, I treat the core predictor as endogenous, and put it in the GMM-style instrument with its second- and higher-orders (lag2-lag21). In this way, can I treat its square term as exogenous?

2. Arellano-Bond test rejects the null until AR(6), is it still okay for me to include lags of 1-5 as instruments? Since I don't have lagged dependent variable in the model, so I am unsure whether Arellano-Bond test still applies to my case.

3. From Prof. Sebastian Kripfganz's slides, I learn that dummy variables are usually treated as exogenous and put in the IV-style instrument with the level option, but how about the interaction term between endogenous / predetermined variables and dummies? If Hansen test and Difference-in-Hansen tests are all satisfied (fairly >0.25), is it justifiable to treat the interaction terms as exogenous?

Lastly, I have run my specification with xtdpdgmm command before, but due to the number of my observations is quite large, I cannot obtain the result even after waiting for more than 30 minutes. Is there any way that I can speed up running xtdpdgmm?

Hereby, I leave my codes:

Code:

xtabond2 migrate i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
c.L.gap_jobdiff3ex##c.L.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu i.yr2-yr22 , ///
gmmstyle(gap_jobdiff3ex, lag(2 .) orthogonal collapse) ///
gmmstyle(gap_ppden gap_enterprise gap_unemploy , lag(1 .) collapse) ///
ivstyle(gap_highedu gap_med) ///
ivstyle(c.L.gap_jobdiff3ex#c.L.gap_jobdiff3ex i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome i.yr2-yr22 , eq(level)) ///
small twostep artests(6) cluster(dest_code)

Note: i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome are time-invariant variables. I confirm that I realize that to include them, a stronger assumption is imposed on the estimation.

Here is the test results:

Code:

------------------------------------------------------------------------------
Group variable: numeric_un~e                    Number of obs      =    670476
Time variable : time                            Number of groups   =     57429
Number of instruments = 94                      Obs per group: min =         1
F(30, 272)    =    109.21                                      avg =     11.67
Prob > F      =     0.000                                      max =        17
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -7.40  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -3.58  Pr > z =  0.000
Arellano-Bond test for AR(3) in first differences: z =  -7.87  Pr > z =  0.000
Arellano-Bond test for AR(4) in first differences: z =  -3.47  Pr > z =  0.001
Arellano-Bond test for AR(5) in first differences: z =  -3.06  Pr > z =  0.002
Arellano-Bond test for AR(6) in first differences: z =  -0.95  Pr > z =  0.342
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(63)   =89629.95 Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(63)   =  62.20  Prob > chi2 =  0.505
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(59)   =  59.02  Prob > chi2 =  0.475
    Difference (null H = exogenous): chi2(4)    =   3.18  Prob > chi2 =  0.528
  gmm(gap_jobdiff3ex, collapse orthogonal lag(2 .))
    Hansen test excluding group:     chi2(49)   =  52.77  Prob > chi2 =  0.331
    Difference (null H = exogenous): chi2(14)   =   9.43  Prob > chi2 =  0.802
  gmm(gap_ppden gap_enterprise gap_unemploy, collapse lag(1 .))
    Hansen test excluding group:     chi2(10)   =  12.31  Prob > chi2 =  0.265
    Difference (null H = exogenous): chi2(53)   =  49.89  Prob > chi2 =  0.596
  iv(gap_highedu gap_med)
    Hansen test excluding group:     chi2(61)   =  60.85  Prob > chi2 =  0.481
    Difference (null H = exogenous): chi2(2)    =   1.35  Prob > chi2 =  0.509
  iv(cL.gap_jobdiff3ex#cL.gap_jobdiff3ex 0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome 0b.yr2 1.yr2 0b.yr3 1.yr3 0b.yr4 1.yr4 0b.yr5 1.yr5 0b.yr6 1.yr6 0b.yr7 1.yr7 0b.yr8 1.yr8 0b.yr9 1.yr9 0b.yr10 1.yr10 0b.yr 11 1.yr11 0b.yr12 1.yr12 0b.yr13 1.yr13 0b.yr14 1.yr14 0b.yr15 1.yr15 0b.yr16 1.yr16 0b.yr17 1.yr17 0b.yr18 1.yr18 0b.yr19 1.yr19 0b.yr20 1.yr20 0b.yr21 1.yr21 0b.yr22 1.yr22, eq(level))
    Hansen test excluding group:     chi2(39)   =  39.88  Prob > chi2 =  0.431
    Difference (null H = exogenous): chi2(24)   =  22.32  Prob > chi2 =  0.560

Thanks for any comments!

Last edited by Huaxin Wanglu; 05 Mar 2021, 18:51.

Tags: None

Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#2

07 Mar 2021, 09:21

It's sad no one can help...?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2585
#3

08 Mar 2021, 11:00

If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.

If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.

Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.

I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.

https://www.kripfganz.de/stata/
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#4

08 Mar 2021, 11:56

Originally posted by Sebastian Kripfganz View Post

If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.

If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.

Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.

I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.

Thanks a lot for the comments. For point 2, does it mean if my specification accepts the null hypothesis at AR(6), I have to use lag5-21 instead of lag2-21 as the instruments?
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#5

08 Mar 2021, 12:07

Originally posted by Sebastian Kripfganz View Post

If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.

If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.

Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.

I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.

Sorry for two more questions:
1. Actually I don't want to run my endogenous variable in the diff-model since it is per se a differenced variable. If I specify the lags of level and lags of difference as instruments manually, will serial correlation still invalidate the instrument?
2. My dependent variable is binary. I remember you have mentioned somewhere that it may have problems with mean stationarity assumption, so I had better not include it if it's not theoretically necessary?

Such as:

Code:

gmmstyle(gap_jobdiff19 , lag(2 .) eq(level) collapse) /// gmmstyle(D.gap_jobdiff19 , lag(1 .) eq(level) collapse) ///

By including the lagged dependent variable, my specification passes AR(2), but I am afraid that as it is binary, I cannot treat it as common lagged dependent variable.

Last edited by Huaxin Wanglu; 08 Mar 2021, 12:11.
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#6

08 Mar 2021, 15:29

Originally posted by Sebastian Kripfganz View Post

If your core predictor is endogenous, it is hard to justify that the squared term is exogenous.

If you choose the second lag of an endogenous variable as an instrument for the first-differenced model, then any serial correlation of the error term will invalidate that instrument. This is irrespective of whether there is a lagged dependent variable or not. A lagged dependent variable in the model can help to remove the serial correlation from the error term.

Similar to point 1, if you have an interaction term between an endogenous variable and an exogenous variable (e.g. a dummy variable), then as a default I would typically still assume that the interaction term is endogenous unless you can come up with a convincing argument why it is not. I would not put too much trust in the overidentification test results. In the first place, you need to have a good theoretical argument for the classification of your variables.

I am sorry that the estimation with xtdpdgmm takes such a long time. Eventually, it should still work with such large data sets. Admittedly, it is much slower than xtabond2. The reason is that there is a trade-off between flexibility of the command and its computational efficiency. xtdpdgmm is intended to provide quite a good bit of additional flexibility over xtabond2. This comes at the cost of a few inefficient parts in the code. If you do not need the extra flexibility, you might be better off with xtabond2 when using such large data sets.

I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

Codes 1:

Code:

xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)

Codes 2:

Code:

xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)

As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.

Last edited by Huaxin Wanglu; 08 Mar 2021, 15:47.
Comment

Huaxin Wanglu

Join Date: Mar 2021
Posts: 33

08 Mar 2021, 17:43

I update my new codes and results here. In this version, I include the lagged dependent variable.

Code:

xtabond2 migrate L.migrate a2003 co_age dy_schooling marriage hukou_type a2025b InIncome ///
c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu yr2-yr22, ///
gmmstyle(migrate, lag(1 1) eq(level)) /// predetermined
gmmstyle(migrate, lag(2 .) eq(diff) collapse) ///
gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(1 1) eq(level)) //// endogenous
gmmstyle(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(2 .) eq(diff) collapse) ///
gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(0 0) eq(level)) /// predetermined
gmmstyle(gap_ppden gap_enterprise gap_unemploy, lag(1 .) eq(diff) collapse) ///
ivstyle(gap_med gap_highedu, eq(level)) /// exogenous
ivstyle(i.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) ///
small twostep artests(4) cluster(dest_code)

Code:

------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -57.35  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -0.97  Pr > z =  0.331
Arellano-Bond test for AR(3) in first differences: z =  -0.03  Pr > z =  0.976
Arellano-Bond test for AR(4) in first differences: z =   0.64  Pr > z =  0.521
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(190)  =2327.26  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(190)  = 196.63  Prob > chi2 =  0.356
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(92)   = 103.26  Prob > chi2 =  0.198
    Difference (null H = exogenous): chi2(98)   =  93.37  Prob > chi2 =  0.613
  gmm(migrate, eq(level) lag(1 1))
    Hansen test excluding group:     chi2(174)  = 187.76  Prob > chi2 =  0.225
    Difference (null H = exogenous): chi2(16)   =   8.87  Prob > chi2 =  0.919
  gmm(migrate, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(174)  = 190.77  Prob > chi2 =  0.182
    Difference (null H = exogenous): chi2(16)   =   5.86  Prob > chi2 =  0.990
  gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
    Hansen test excluding group:     chi2(158)  = 164.21  Prob > chi2 =  0.351
    Difference (null H = exogenous): chi2(32)   =  32.43  Prob > chi2 =  0.446
  gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(158)  = 178.05  Prob > chi2 =  0.131
    Difference (null H = exogenous): chi2(32)   =  18.58  Prob > chi2 =  0.972
  gmm(gap_ppden gap_enterprise gap_unemploy, eq(level) lag(0 0))
    Hansen test excluding group:     chi2(140)  = 155.43  Prob > chi2 =  0.176
    Difference (null H = exogenous): chi2(50)   =  41.20  Prob > chi2 =  0.808
  gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
    Hansen test excluding group:     chi2(140)  = 159.45  Prob > chi2 =  0.125
    Difference (null H = exogenous): chi2(50)   =  37.18  Prob > chi2 =  0.910
  iv(gap_med gap_highedu, eq(level))
    Hansen test excluding group:     chi2(188)  = 195.96  Prob > chi2 =  0.330
    Difference (null H = exogenous): chi2(2)    =   0.67  Prob > chi2 =  0.715
  iv(0b.a2003 1.a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11
> yr12 yr13 yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
    Hansen test excluding group:     chi2(166)  = 183.18  Prob > chi2 =  0.171
    Difference (null H = exogenous): chi2(24)   =  13.45  Prob > chi2 =  0.958

Last edited by Huaxin Wanglu; 08 Mar 2021, 18:23.

Comment

Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#8

08 Mar 2021, 18:15

Originally posted by Huaxin Wanglu View Post

I am re-reading your presentation slides tonight, I am wondering could you please tell me what is the difference between these two specifications?

Codes 1:

Code:

xtdpdgmm L(0/1).n w k, model(diff) collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) /// > gmm(n, lag(1 1) diff model(level)) gmm(w k, lag(0 0) diff model(level)) two vce(r)

Codes 2:

Code:

xtdpdgmm L(0/1).n w k, collapse gmm(n, lag(2 4)) gmm(w k, lag(1 3)) two vce(r)

As I understand, by default, gmm(n, lag(2 4)) returns that for first differences equation: L(2/4).n; for levels equation: L.D.n

Literally I think they are the same, but I run it in 1st form in xtabond2, the results are totally different from 2rd form. By using Codes 1, I achieve a quite good result. Yet, coefficients with Codes 2 are mainly statistically insignificant. I don't I should believe which one...

Sorry about that I am quite unfamiliar with GMM estimation. It is my first research project to use it. Thanks again.

Ah, I have figured out this difference by reading your old posts

HTML Code:

https://www.statalist.org/forums/forum/general-stata-discussion/general/1395858-xtdpdgmm-new-stata-command-for-efficient-gmm-estimation-of-linear-dynamic-panel-models-with-nonlinear-moment-conditions/page2

And also, I have realized that my codes in xtabond2 are not completely equivalent since I did not collapse the differenced instruments for the level model (I prefer not).

If possible, could you take a look at my codes posted in #7?

Last edited by Huaxin Wanglu; 08 Mar 2021, 18:21.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2585
#9

09 Mar 2021, 12:18

Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.

https://www.kripfganz.de/stata/
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#10

09 Mar 2021, 18:43

Originally posted by Sebastian Kripfganz View Post

Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.

Thanks a million. Your kind replies indeed help a lot!
Comment

Huaxin Wanglu

Join Date: Mar 2021
Posts: 33

#11

10 Mar 2021, 14:44

Originally posted by Sebastian Kripfganz View Post

Your code and your specification test results in #7 look fine as far as I can tell by quickly looking at them. The binary nature of the dependent variable does not necessarily cause problems.

Hello, may I ask you another question? To address reverse causality, I lag the variables with one period in OLS & FE, but when I use L.gap_jobdiff3ex instead of gap_jobdiff3ex in GMM, the p-values of Hansen test reduce to 0.10 and Difference-in-Hansen tests cannot fully pass. I guess this may because by lagging the variables, deeper lags suffer the weakened instruments problem. From a paper, I learn that GMM can tackle reverse causality without lagging. Because the paper my conceptual framework base on lag all the variables with one period in sys-GMM, so I am quite concerned with this option. Could you leave me some tips for if I should lag one period in GMM and how? By lagging one period, I also tried to add second lag of my dependent variable to the model. Since AR test accepts the null at AR(3), so I revise the codes to be as below but the coefficient of the L2 is negative.

In principle, the Arellano-Bond (AB) estimator and related dynamic panel models offer a powerful toolbox to tackle endogeneity problems caused by both reverse causality and unobserved heterogeneity.

We rely on the approach advocated by Arellano and Bond (1991) taking first differences in a first step to remove unobserved heterogeneity and then using second- and higher-order lags of the dependent variables as instruments in a standard GMM framework to deal with reverse causality.

Code:

gmmstyle(migrate, lag(2 2) eq(level)) ///
gmmstyle(migrate, lag(3 .) eq(diff) collapse) ///

Results with two lagged dependent variables:

Code:

-----------------------------------------------------------------------------------------------------
                                    |              Corrected
                            migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------+----------------------------------------------------------------
                            migrate |
                                L1. |   1.159987   .0730317    15.88   0.000     1.016205    1.303768
                                L2. |  -.1898163   .0698226    -2.72   0.007    -.3272801   -.0523526
                                    |
                              a2003 |  -.0000582   .0001527    -0.38   0.703    -.0003588    .0002423
                             co_age |  -.0003186   .0000309   -10.32   0.000    -.0003794   -.0002578
                       dy_schooling |   .0002668   .0000481     5.55   0.000     .0001722    .0003614
                           marriage |   -.004595   .0006148    -7.47   0.000    -.0058053   -.0033846
                         hukou_type |  -.0007282   .0004208    -1.73   0.085    -.0015566    .0001001
                             a2025b |  -.0001572   .0001256    -1.25   0.212    -.0004046    .0000901
                           InIncome |   .0003588   .0001017     3.53   0.000     .0001587     .000559
                                    |
                     gap_jobdiff3ex |
                                L1. |   .0000635   .0000702     0.91   0.366    -.0000747    .0002017
                                    |
cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   1.67e-06   6.29e-07     2.65   0.009     4.28e-07    2.91e-06
                                    |
                          gap_ppden |
                                L1. |   5.55e-06   2.44e-06     2.27   0.024     7.45e-07    .0000104
                                    |
                       gap_unemploy |
                                L1. |  -.0308105   .1619523    -0.19   0.849     -.349655    .2880341
                                    |
                     gap_enterprise |
                                L1. |   .0004517   .0003214     1.41   0.161    -.0001811    .0010844
                                    |
                            gap_med |
                                L1. |   5.050586   1.232691     4.10   0.000     2.623718    7.477454
                                    |
                        gap_highedu |
                                L1. |   .3269975   .0754531     4.33   0.000     .1784488    .4755461

Code:

------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z =  -8.77  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   2.73  Pr > z =  0.006
Arellano-Bond test for AR(3) in first differences: z =  -0.04  Pr > z =  0.967
Arellano-Bond test for AR(4) in first differences: z =  -0.59  Pr > z =  0.558
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(189)  =1776.06  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(189)  = 212.81  Prob > chi2 =  0.113
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(91)   =  94.47  Prob > chi2 =  0.381
    Difference (null H = exogenous): chi2(98)   = 118.34  Prob > chi2 =  0.079
  gmm(migrate, eq(level) lag(2 2))
    Hansen test excluding group:     chi2(173)  = 189.50  Prob > chi2 =  0.185
    Difference (null H = exogenous): chi2(16)   =  23.32  Prob > chi2 =  0.106
  gmm(migrate, collapse eq(diff) lag(3 .))
    Hansen test excluding group:     chi2(173)  = 191.34  Prob > chi2 =  0.161
    Difference (null H = exogenous): chi2(16)   =  21.47  Prob > chi2 =  0.161
  gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
    Hansen test excluding group:     chi2(157)  = 177.20  Prob > chi2 =  0.129
    Difference (null H = exogenous): chi2(32)   =  35.61  Prob > chi2 =  0.302
  gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(157)  = 169.57  Prob > chi2 =  0.233
    Difference (null H = exogenous): chi2(32)   =  43.25  Prob > chi2 =  0.089
  gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
    Hansen test excluding group:     chi2(139)  = 149.53  Prob > chi2 =  0.256
    Difference (null H = exogenous): chi2(50)   =  63.28  Prob > chi2 =  0.098
  gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
    Hansen test excluding group:     chi2(139)  = 173.36  Prob > chi2 =  0.026
    Difference (null H = exogenous): chi2(50)   =  39.45  Prob > chi2 =  0.858
  iv(L.gap_med L.gap_highedu, eq(level))
    Hansen test excluding group:     chi2(187)  = 210.15  Prob > chi2 =  0.118
    Difference (null H = exogenous): chi2(2)    =   2.66  Prob > chi2 =  0.264
  iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
> r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
    Hansen test excluding group:     chi2(165)  = 182.91  Prob > chi2 =  0.161
    Difference (null H = exogenous): chi2(24)   =  29.90  Prob > chi2 =  0.188

Results with only 1 lagged dependent variable:

Code:

-----------------------------------------------------------------------------------------------------
                                    |              Corrected
                            migrate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------------------------+----------------------------------------------------------------
                            migrate |
                                L1. |   .9635485   .0053282   180.84   0.000     .9530589    .9740382
                                    |
                              a2003 |  -.0001579   .0001957    -0.81   0.420    -.0005433    .0002274
                             co_age |   -.000373   .0000264   -14.11   0.000     -.000425    -.000321
                       dy_schooling |   .0003459   .0000477     7.25   0.000      .000252    .0004398
                           marriage |  -.0047542   .0007491    -6.35   0.000    -.0062289   -.0032795
                         hukou_type |  -.0008414   .0005379    -1.56   0.119    -.0019003    .0002174
                             a2025b |  -.0001694   .0001507    -1.12   0.262     -.000466    .0001272
                           InIncome |   .0004433   .0001295     3.42   0.001     .0001882    .0006983
                                    |
                     gap_jobdiff3ex |
                                L1. |   .0000999   .0000775     1.29   0.198    -.0000526    .0002524
                                    |
cL.gap_jobdiff3ex#cL.gap_jobdiff3ex |   2.01e-06   7.03e-07     2.86   0.005     6.28e-07    3.40e-06
                                    |
                          gap_ppden |
                                L1. |   7.02e-06   2.77e-06     2.53   0.012     1.55e-06    .0000125
                                    |
                       gap_unemploy |
                                L1. |  -.1215351    .169508    -0.72   0.474     -.455244    .2121739
                                    |
                     gap_enterprise |
                                L1. |    .000753   .0003926     1.92   0.056      -.00002     .001526
                                    |
                            gap_med |
                                L1. |   5.361344    1.37739     3.89   0.000     2.649688       8.073
                                    |
                        gap_highedu |
                                L1. |   .4120449   .0989178     4.17   0.000     .2173063    .6067835

Code:

------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -59.78  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =   1.08  Pr > z =  0.282
Arellano-Bond test for AR(3) in first differences: z =  -0.13  Pr > z =  0.894
Arellano-Bond test for AR(4) in first differences: z =  -1.06  Pr > z =  0.289
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(191)  =2240.55  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(191)  = 215.61  Prob > chi2 =  0.107
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(93)   =  97.64  Prob > chi2 =  0.351
    Difference (null H = exogenous): chi2(98)   = 117.97  Prob > chi2 =  0.083
  gmm(migrate, eq(level) lag(1 1))
    Hansen test excluding group:     chi2(175)  = 194.32  Prob > chi2 =  0.151
    Difference (null H = exogenous): chi2(16)   =  21.29  Prob > chi2 =  0.168
  gmm(migrate, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(174)  = 200.42  Prob > chi2 =  0.083
    Difference (null H = exogenous): chi2(17)   =  15.18  Prob > chi2 =  0.582
  gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, eq(level) lag(1 1))
    Hansen test excluding group:     chi2(159)  = 172.30  Prob > chi2 =  0.223
    Difference (null H = exogenous): chi2(32)   =  43.31  Prob > chi2 =  0.088
  gmm(L.gap_jobdiff3ex cL.gap_jobdiff3ex#cL.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(159)  = 172.53  Prob > chi2 =  0.219
    Difference (null H = exogenous): chi2(32)   =  43.08  Prob > chi2 =  0.091
  gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, eq(level) lag(0 0))
    Hansen test excluding group:     chi2(141)  = 172.45  Prob > chi2 =  0.037
    Difference (null H = exogenous): chi2(50)   =  43.16  Prob > chi2 =  0.742
  gmm(L.gap_ppden L.gap_enterprise L.gap_unemploy, collapse eq(diff) lag(1 .))
    Hansen test excluding group:     chi2(141)  = 176.85  Prob > chi2 =  0.022
    Difference (null H = exogenous): chi2(50)   =  38.76  Prob > chi2 =  0.876
  iv(L.gap_med L.gap_highedu, eq(level))
    Hansen test excluding group:     chi2(189)  = 214.05  Prob > chi2 =  0.102
    Difference (null H = exogenous): chi2(2)    =   1.56  Prob > chi2 =  0.458
  iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13 y
> r14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
    Hansen test excluding group:     chi2(167)  = 187.84  Prob > chi2 =  0.129
    Difference (null H = exogenous): chi2(24)   =  27.76  Prob > chi2 =  0.270

Leszczensky, L., & Wolbring, T. (2019). How to Deal With Reverse Causality Using Panel Data? Recommendations for Researchers Based on a Simulation Study. Sociological Methods & Research. https://doi.org/10.1177/0049124119882473

Last edited by Huaxin Wanglu; 10 Mar 2021, 15:23.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2585
#12

11 Mar 2021, 04:15

Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.

https://www.kripfganz.de/stata/
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#13

11 Mar 2021, 07:52

Originally posted by Sebastian Kripfganz View Post

Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.

Lots of thanks for the comment! You save me a lot of time. It is indeed succinct and informative.
Comment

Huaxin Wanglu

Join Date: Mar 2021
Posts: 33

#14

15 Mar 2021, 12:06

Originally posted by Sebastian Kripfganz View Post

Lagging variables to avoid reverse causality is often an ill-advised approach. You would be deliberately misspecifying your model. The reverse causality problem (which is a source of endogeneity) can simply be dealt with by using lagged instruments.

There are instances when lagging makes sense, e.g. if your dependent variable is a flow variable and your independent variable is a stock variable measured at the end of the a period. In your model, you clearly want the stock at the end of the previous period (not the current period) to affect the current period's flow variable. Otherwise, lagging right-hand side variables really just make sense if the effects indeed occur delayed.

Dear Prof. Sebastian Kripfganz, may I ask you a new question? Roodman(2009) mentions that a p-value of Hansen test as high as 0.25 should be viewed with concern. It implies a safe value falling into the range of 0.1 and 0.25 (if I understand correctly?). However, after comparing over a hundred times, I find that when the p-value is within this range, the C test (Difference-in-Hansen) usually cannot safely accept since at least one of the subset for either excluding group or including group would be smaller than 0.1 and sometimes, even <0.05. I am hesitated about how to make trade-off between them. Would you think a p-value of [0.05, 0.1] of C tests is acceptable? And shall I be worried if the p-value of overall Hansen test larger than 0.5? After adding a few new independent variables, it becomes 0.621. The number of instruments is 229 and observations is 339,855.

Code:

------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -44.23  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -1.23  Pr > z =  0.220
Arellano-Bond test for AR(3) in first differences: z =   1.34  Pr > z =  0.182
Arellano-Bond test for AR(4) in first differences: z =   0.56  Pr > z =  0.576
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(191)  =1326.63  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(191)  = 184.39  Prob > chi2 =  0.621
  (Robust, but weakened by many instruments.)

Difference-in-Hansen tests of exogeneity of instrument subsets:
  GMM instruments for levels
    Hansen test excluding group:     chi2(137)  = 124.81  Prob > chi2 =  0.764
    Difference (null H = exogenous): chi2(54)   =  59.59  Prob > chi2 =  0.280
  gmm(migrate, eq(level) lag(1 1))
    Hansen test excluding group:     chi2(175)  = 171.65  Prob > chi2 =  0.557
    Difference (null H = exogenous): chi2(16)   =  12.74  Prob > chi2 =  0.691
  gmm(migrate, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(176)  = 168.64  Prob > chi2 =  0.641
    Difference (null H = exogenous): chi2(15)   =  15.75  Prob > chi2 =  0.399
  gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, eq(level) lag(1 1))
    Hansen test excluding group:     chi2(159)  = 154.63  Prob > chi2 =  0.583
    Difference (null H = exogenous): chi2(32)   =  29.76  Prob > chi2 =  0.580
  gmm(gap_jobdiff3ex c.gap_jobdiff3ex#c.gap_jobdiff3ex, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(159)  = 159.06  Prob > chi2 =  0.484
    Difference (null H = exogenous): chi2(32)   =  25.33  Prob > chi2 =  0.792
  gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(level) lag(1 1))
    Hansen test excluding group:     chi2(188)  = 182.40  Prob > chi2 =  0.602
    Difference (null H = exogenous): chi2(3)    =   1.99  Prob > chi2 =  0.574
  gmm(gap_labprod gap_LQ19 gap_terti, collapse eq(diff) lag(2 .))
    Hansen test excluding group:     chi2(141)  = 150.18  Prob > chi2 =  0.283
    Difference (null H = exogenous): chi2(50)   =  34.21  Prob > chi2 =  0.957
  gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(level) lag(0 0))
    Hansen test excluding group:     chi2(188)  = 179.76  Prob > chi2 =  0.654
    Difference (null H = exogenous): chi2(3)    =   4.64  Prob > chi2 =  0.200
  gmm(gap_ppden gap_enterprise gap_unemploy, collapse eq(diff) lag(1 .))
    Hansen test excluding group:     chi2(141)  = 149.14  Prob > chi2 =  0.303
    Difference (null H = exogenous): chi2(50)   =  35.25  Prob > chi2 =  0.943
  iv(gap_highedu gap_med gap_theater, eq(level))
    Hansen test excluding group:     chi2(188)  = 181.66  Prob > chi2 =  0.616
    Difference (null H = exogenous): chi2(3)    =   2.73  Prob > chi2 =  0.435
  iv(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2 yr3 yr4 yr5 yr6 yr7 yr8 yr9 yr10 yr11 yr12 yr13
> yr14 yr15 yr16 yr17 yr18 yr19 yr20 yr21 yr22, eq(level))
    Hansen test excluding group:     chi2(167)  = 164.20  Prob > chi2 =  0.547
    Difference (null H = exogenous): chi2(24)   =  20.20  Prob > chi2 =  0.685

Last edited by Huaxin Wanglu; 15 Mar 2021, 12:13.

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2585
#15

16 Mar 2021, 05:31

The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers:
Kiviet, J. F. (2020). Microeconometric dynamic panel data methods: Model specification and selection issues. Econometrics and Statistics 13, 16-45.

If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.

https://www.kripfganz.de/stata/
Comment

Announcement

Specify interaction/square terms with xtabond2 & xtdpdgmm

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment