Specify interaction/square terms with xtabond2 & xtdpdgmm

Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#16

16 Mar 2021, 16:26

Originally posted by Sebastian Kripfganz View Post

The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers:
Kiviet, J. F. (2020). Microeconometric dynamic panel data methods: Model specification and selection issues. Econometrics and Statistics 13, 16-45.

If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.

Thanks again for replying to my issues. I also compare the number of my instruments and observations with other articles. Comparisons suggest it is quite fine. I believe there is very low probability of instrument proliferation. And thanks for recommending the methodological paper. I will read it.
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#17

18 Mar 2021, 20:27

Originally posted by Sebastian Kripfganz View Post

The p-value range from 0.1 to 0.25 is quite arbitrary. Personally, I would not focus much on this rule of thumb. A high p-value of the Hansen test could indeed be an indication of a too-many-instruments problem, but it could also simply be an indication that there is no evidence to reject the model. Jan Kiviet takes a different stand on these p-values in one of his recent papers:
Kiviet, J. F. (2020). Microeconometric dynamic panel data methods: Model specification and selection issues. Econometrics and Statistics 13, 16-45.

If you ensure from the beginning that the risk of running into a too-many-instruments problem is low, then you would not have to worry much about this rule of thumb.

There is no general answer whether a p-value between 0.05 and 0.1 for the difference-in-Hansen test is acceptable. If the tested instruments are crucial for the identification of your main coefficients of interest, then this might be worrysome. On the other side, with such a large number of observations I would take much more comfort in such a p-value than with a small sample size, in particular if all other tests are fine.

Hello, truly sorry for raising up a question again. It's not very critical to my model, but I have been quite confused for a couple of days. In the specification of #14, I also collapse the instruments in the level model except for my core predictor. Because when I collapse all the instruments in both level and transformed models, my core predictor turns to be statistically insignificant. I understand that this is very likely because in large samples, collapsing worsens the statistical efficiency. However, when I switching from the combination of (a b c, lag (2 .) eq(diff) collapse) (a b c, lag (1 1) eq(level) collapse) towards (abc, lag (2 8) eq(diff) collapse) (a b c, lag (1 1) eq(level)), one variable changes from being statistically significant to insignificant and another variable becomes statistically significant. If the changes result from better statistical efficiency, then I think insignificant → significant is reasonable, but significant → insignificant sounds weird to me...

I personally think that the 2rd version is better because it makes a better trade-off between statistical efficiency and too-many-instrument problem. And also I see you pointed out somewhere that collapsing specific instruments instead of all should be justified with a good reason. But I am not very confident in my understanding and so hope to learn your advice/opinions. Thanks a lot!

If we just use second and third lags as instruments this leads to the pretty large total of 122 instruments. Using only the second order lags as instruments leads to just 64 instruments and much larger standard errors. When we collapse all the instruments in the standard way 76 instruments remain with results that differ substantially from those that simply skip higher-order lags from the full set of available instruments. Collapsing yields more insignificant regressors too.

My old codes in #14:

Code:

xtabond2 migrate L.migrate a2003 c.co_age##c.co_age dy_schooling marriage hukou_type a2025b InIncome /// c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu gap_theater gap_labprod gap_terti gap_LQ19 yr2-yr22, /// gmmstyle(migrate, lag(1 1) eq(level) collapse) /// predetermined gmmstyle(migrate, lag(2 .) eq(diff) collapse) /// gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(1 1) eq(level)) /// gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(2 .) eq(diff) collapse) /// gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(1 1) eq(level) collapse) /// gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(2 .) eq(diff) collapse) /// gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(0 0) eq(level) collapse) /// gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(1 .) eq(diff) collapse) /// ivstyle(gap_highedu gap_med gap_theater, eq(level)) /// ivstyle(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) /// small twostep artests(4) cluster(dest_code)

New codes:

Code:

xtabond2 migrate L.migrate a2003 c.co_age##c.co_age dy_schooling marriage hukou_type a2025b InIncome /// c.gap_jobdiff3ex##c.gap_jobdiff3ex gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu gap_theater gap_labprod gap_terti gap_LQ19 yr2-yr22, /// gmmstyle(migrate, lag(1 1) eq(level)) /// predetermined gmmstyle(migrate, lag(2 8) eq(diff) collapse) /// gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(1 1) eq(level)) /// endogenous gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(2 8) eq(diff) collapse) /// gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(1 1) eq(level)) /// endogenous gmmstyle(gap_labprod gap_LQ19 gap_terti, lag(2 8) eq(diff) collapse) /// gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(0 0) eq(level)) /// not strictly exogenous gmmstyle(gap_ppden gap_enterprise gap_unemploy,lag(1 3) eq(diff) collapse) /// ivstyle(gap_highedu gap_med gap_theater, eq(level)) /// exogenous ivstyle(a2003 co_age dy_schooling marriage hukou_type a2025b InIncome yr2-yr22, eq(level)) /// small twostep artests(4) cluster(dest_code)

Last edited by Huaxin Wanglu; 18 Mar 2021, 21:12.
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#18

19 Mar 2021, 05:05

That seems to be a matter about efficiency in the (implicit) first-stage regressions of the regressors on the instruments. These level instruments might be informative for some variables but less informative for others. Adding further informative instruments helps to improve the first-stage fit, while adding further uninformative (weak) instruments worsens the first-stage fit. Adding more (instrumental) variables is not always better, even in large samples.

https://www.kripfganz.de/stata/
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#19

19 Mar 2021, 12:16

Originally posted by Sebastian Kripfganz View Post

That seems to be a matter about efficiency in the (implicit) first-stage regressions of the regressors on the instruments. These level instruments might be informative for some variables but less informative for others. Adding further informative instruments helps to improve the first-stage fit, while adding further uninformative (weak) instruments worsens the first-stage fit. Adding more (instrumental) variables is not always better, even in large samples.

Thank you for the reply. It helps deepen my understanding. I will choose to report 2rd version in my paper.
Comment

Huaxin Wanglu

Join Date: Mar 2021
Posts: 33

#20

21 May 2021, 15:08

Originally posted by Sebastian Kripfganz View Post

That seems to be a matter about efficiency in the (implicit) first-stage regressions of the regressors on the instruments. These level instruments might be informative for some variables but less informative for others. Adding further informative instruments helps to improve the first-stage fit, while adding further uninformative (weak) instruments worsens the first-stage fit. Adding more (instrumental) variables is not always better, even in large samples.

Hello, dear Dr. Kripfganz, I am trying to elaborate my specification and want to include a few interaction terms between my core predictor and time-invariant dummies. When I tested an interaction effect between my continuous variable of interest and gender dummy, I found although AR(2) is well above 0.1, but either AR(3) or AR(4) becomes significantly lower and even just around 0.05. I tried various lag strategies but none of them solves this problem, I am completely confused about why it happens. May I have your advice?

Code:

xtabond2 migrate L.migrate a2003 c.co_age##c.co_age dy_schooling marriage hukou_type a2025b InIncome  ///
c.gap_jobdiff3ex##c.gap_jobdiff3ex##i.a2003  gap_Inwage gap_ppden gap_unemploy gap_enterprise gap_med gap_highedu i.yr2-yr22, ///
gmmstyle(migrate, lag(1 1) eq(level) ) /// lagged dependent
gmmstyle(migrate, lag(2 5) eq(diff) collapse) ///
gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(1 1) eq(level)) /// endogenous
gmmstyle(c.gap_jobdiff3ex##c.gap_jobdiff3ex, lag(2 5) eq(diff) collapse) ///
gmmstyle(a2003#c.gap_jobdiff3ex a2003#c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(1 1) eq(level)) /// endogenous
gmmstyle(a2003#c.gap_jobdiff3ex a2003#c.gap_jobdiff3ex#c.gap_jobdiff3ex, lag(2 5) eq(diff) collapse) ///
gmmstyle(gap_Inwage gap_ppden gap_enterprise gap_unemploy,lag(0 0) eq(level)) /// predetermined
gmmstyle(gap_Inwage gap_ppden gap_enterprise gap_unemploy,lag(1 4) eq(diff) collapse) ///
ivstyle(gap_highedu gap_med ) /// exogenous
ivstyle(a2003 dy_schooling co_age marriage hukou_type a2025b InIncome yr2-yr22 , eq(level)) /// exogenous
small twostep artests(4) cluster(dest_code)

Here, a2003 is the gender dummy.

Code:

------------------------------------------------------------------------------
Arellano-Bond test for AR(1) in first differences: z = -56.89  Pr > z =  0.000
Arellano-Bond test for AR(2) in first differences: z =  -1.03  Pr > z =  0.301
Arellano-Bond test for AR(3) in first differences: z =   1.67  Pr > z =  0.095
Arellano-Bond test for AR(4) in first differences: z =   1.37  Pr > z =  0.170
------------------------------------------------------------------------------
Sargan test of overid. restrictions: chi2(172)  =1810.05  Prob > chi2 =  0.000
  (Not robust, but not weakened by many instruments.)
Hansen test of overid. restrictions: chi2(172)  = 187.35  Prob > chi2 =  0.200
  (Robust, but weakened by many instruments.)

Comment

Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#21

22 May 2021, 04:07

I would not worry too much about this problem. in the output you have shown, the AR(3) and AR(4) p-values still appear to be acceptable. Also, the test might generally become less reliable for higher orders of autocorrelation.

https://www.kripfganz.de/stata/
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#22

23 May 2021, 12:01

Originally posted by Sebastian Kripfganz View Post

I would not worry too much about this problem. in the output you have shown, the AR(3) and AR(4) p-values still appear to be acceptable. Also, the test might generally become less reliable for higher orders of autocorrelation.

Ah, thanks a lot! I can be at ease to use such lag strategies.
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#23

30 Oct 2021, 07:36

Originally posted by Sebastian Kripfganz View Post

I would not worry too much about this problem. in the output you have shown, the AR(3) and AR(4) p-values still appear to be acceptable. Also, the test might generally become less reliable for higher orders of autocorrelation.

Hello, Dr. Kripfganz. I am recently revising my models and I have newly found a weird situation--the explanatory variable of interest is significant in GMM (consistently significant with different lag lengths) but insignificant in static models. To be honest, in this case, I am quite uncertain if the results are indeed significant or not. And conversely, another variable is significant in static models, but insignificant in GMM. I don't understand why the discrepancy arises and which result is more reliable? Is there any way to justify it?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#24

31 Oct 2021, 06:19

Those results are not directly comparable. When you make the model dynamic, you change the meaning of the coefficients. When the dependent variable is quite persistent, it is not uncommon that you have significant coefficients in the static model but insignificant ones in the dynamic model. The other way round is more surprising but I do not see a reason why this should not happen. If there is an omitted variable, much depends on how this variable is related to the included variables. For example, if you add a lag of the variable with questionable coefficient as another regressor to the dynamic model, you might observe that both coefficients are statistically significant with opposite signs and that the effects cancel out when you add them up.

https://www.kripfganz.de/stata/
1 like
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#25

08 Nov 2021, 11:48

Originally posted by Sebastian Kripfganz View Post

Those results are not directly comparable. When you make the model dynamic, you change the meaning of the coefficients. When the dependent variable is quite persistent, it is not uncommon that you have significant coefficients in the static model but insignificant ones in the dynamic model. The other way round is more surprising but I do not see a reason why this should not happen. If there is an omitted variable, much depends on how this variable is related to the included variables. For example, if you add a lag of the variable with questionable coefficient as another regressor to the dynamic model, you might observe that both coefficients are statistically significant with opposite signs and that the effects cancel out when you add them up.

Thanks a lot for the reply. I tried to add a lagged dependent to my OLS model and found that predictors turn to be statistically significant. I guess the problem arises because my predictor is a three-year volatility and my dependent variable is persistent.
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#26

01 Dec 2021, 08:09

Originally posted by Sebastian Kripfganz View Post

Those results are not directly comparable. When you make the model dynamic, you change the meaning of the coefficients. When the dependent variable is quite persistent, it is not uncommon that you have significant coefficients in the static model but insignificant ones in the dynamic model. The other way round is more surprising but I do not see a reason why this should not happen. If there is an omitted variable, much depends on how this variable is related to the included variables. For example, if you add a lag of the variable with questionable coefficient as another regressor to the dynamic model, you might observe that both coefficients are statistically significant with opposite signs and that the effects cancel out when you add them up.

Hello, Dr. Kripfganz. I encountered a new problem that a researcher suggested to me that adding lagged dependent variable to the model changes its meaning and it is problematic because my DV is an individual-level variable (migration decision). So, migration decision at time t shouldn't depend on its status at time t-1. But as I initially proposed in this post, excluding the lagged term, I cannot pass AR(2) test. I am wondering about the implication on adding a lagged term in GMM?
Comment
Sebastian Kripfganz

Join Date: May 2014

Posts: 2596
#27

02 Dec 2021, 02:18

I do not know the migration literature very well. Is your dependent variable a binary dummy variable? If an individual decides to migrate in period t-1, is that individual still observed in the data set in period t? If so, doesn't the decision to migrate in t-1 affect the migration state in t?

I understand however the concern of the other researcher. A model with a lagged dependent variable may not be the natural choice for your application, depending on how exactly your dependent variable is constructed. It might be more promising to include lags of the independent variables instead of a lagged dependent variable, i.e. to use a distributed lag model. The migration decision at t might well depend on exogenous factors in the previous periods. Adding lags of the independent variables can also help to alleviate autocorrelation concerns. Moreover, without a lagged dependent variable, autocorrelation may not be a concern for the consistency of the estimator anymore if the independent variables are strictly exogenous. It might be sufficient to just use the usual panel-robust standard errors for correct inference.

https://www.kripfganz.de/stata/
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#28

02 Dec 2021, 11:36

Originally posted by Sebastian Kripfganz View Post

I do not know the migration literature very well. Is your dependent variable a binary dummy variable? If an individual decides to migrate in period t-1, is that individual still observed in the data set in period t? If so, doesn't the decision to migrate in t-1 affect the migration state in t?

I understand however the concern of the other researcher. A model with a lagged dependent variable may not be the natural choice for your application, depending on how exactly your dependent variable is constructed. It might be more promising to include lags of the independent variables instead of a lagged dependent variable, i.e. to use a distributed lag model. The migration decision at t might well depend on exogenous factors in the previous periods. Adding lags of the independent variables can also help to alleviate autocorrelation concerns. Moreover, without a lagged dependent variable, autocorrelation may not be a concern for the consistency of the estimator anymore if the independent variables are strictly exogenous. It might be sufficient to just use the usual panel-robust standard errors for correct inference.

Highly appreciated for answering me!

Yes, exactly as you understand, my DV is a binary dummy variable indicating migrated or not. If a migrant moved at time t-1, he/she would be still in my sample and coded as 1 at time t. I also personally think that it is reasonable to justify that migration at t-1 affects the status at t, but since I couldn't know if they return afterwards, so probably this design is problematic. After I revised my DV to be 1 at the year of migration and 0 before and afterwards, the DV is no longer a persistent variable.

"It might be more promising to include lags of the independent variables instead of a lagged dependent variable, i.e. to use a distributed lag model." This sounds promising. I need to explore the literature. Thank you so much for the suggestion! I was thinking that I needed to remove all the GMM parts and was quite upset since I spent lots of time in learning and applying it over the past few months.

Do tests usually should be reported have any changes? I have just tested with my revised DV and without the lagged term, the results are quite good! I pass AR and Hansen's!

Last edited by Huaxin Wanglu; 02 Dec 2021, 12:30.
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#29

02 Dec 2021, 12:49

One specification passes at AR(4), others at AR(2). For the one cannot pass at AR(2), I revised the IV to be lagged by one period, and the results now are very good! Shall I also include the non-lagged term? Both L. OR L(0/1) returns good results!

Last edited by Huaxin Wanglu; 02 Dec 2021, 13:30.
Comment
Huaxin Wanglu

Join Date: Mar 2021

Posts: 33
#30

03 Jan 2022, 10:23

Originally posted by Sebastian Kripfganz View Post

I do not know the migration literature very well. Is your dependent variable a binary dummy variable? If an individual decides to migrate in period t-1, is that individual still observed in the data set in period t? If so, doesn't the decision to migrate in t-1 affect the migration state in t?

I understand however the concern of the other researcher. A model with a lagged dependent variable may not be the natural choice for your application, depending on how exactly your dependent variable is constructed. It might be more promising to include lags of the independent variables instead of a lagged dependent variable, i.e. to use a distributed lag model. The migration decision at t might well depend on exogenous factors in the previous periods. Adding lags of the independent variables can also help to alleviate autocorrelation concerns. Moreover, without a lagged dependent variable, autocorrelation may not be a concern for the consistency of the estimator anymore if the independent variables are strictly exogenous. It might be sufficient to just use the usual panel-robust standard errors for correct inference.

Dear Dr. Kripfganz,
Hello, sorry for asking you again and again. I am checking my last specifications for report. The creation of my core predictor involves 3 time periods, t, t-1, t-2, and when I do not lag other control variables, AR(2) and AR(4) are well above 0.1 but AR(3) is well below. So, I lag the control variables (they are state variables measured at the end of each year, so this choice is justifiable), then AR(2)--AR(4) are all well above 0.1. But I do not very understand what happen to AR(3) and why I lag the control variables, AR(3) turns out to be no serial correlation?

I also tried using distributed lag model as you suggested, AR test results are good, but Hansen J test result is often around 0.15 and the control variables both lagged- and non-lagged terms are statistically insignificant. And by using distributed lag model, the GMM specification is not consistent with my FE models.

BTW, wish you and your family a Happy New Year!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment