Nonlinear in endogenous variables and two-stage IV

Yanrong Jia

Join Date: Nov 2023

Posts: 3
#1

Nonlinear in endogenous variables and two-stage IV

22 Nov 2023, 10:04

Dear Stata Community,

I have a question regarding 2-stage IV regressions with interaction. For example, I want to examine the following relationship:

Y=b1_X1+b2_X2+b3_X1×X2+e

My research question focuses on the coefficient of the interactive variable (i.e., b3).

Let’s say X1 is endogenous. I intend to address this concern by using an exogenous shock and 2-stage IV regressions. The first-stage regression models are the following:

X1 = b1_treated + b2_ post + b3_treated*post + b4_ treated*post*X2 + b5_X2 + e
X1*X2 = b1_treated + b2_ post + b3_treated*post + b4_ treated*post*X2 + b5_X2 + e

In the second stage, I will use both the predicted X1 and predicted X1*X2 in the regression. My question is whether I should include treated*X2 and post*X2 in the first stage. I have seen some papers that omit treated*X2 and post*X2 in the first stage, but I am not sure about the rationale behind this exclusion. Therefore, I am unsure about the correct approach in this scenario.

Thank you in advance for your response!

Best,

Yanrong
Tags: instrumental variables, interaction terms, ivreg2
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2160
#2

23 Nov 2023, 06:18

Yanrong: Looks like your data has a time dimension. Is it panel data?
Comment
Yanrong Jia

Join Date: Nov 2023

Posts: 3
#3

23 Nov 2023, 08:49

Originally posted by Jeff Wooldridge View Post

Yanrong: Looks like your data has a time dimension. Is it panel data?

Hi, Professor Wooldridge. Yes, it is panel data.

Last edited by Yanrong Jia; 23 Nov 2023, 09:02.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2160
#4

23 Nov 2023, 21:00

I think I can help, but I need to know more about the data and estimation. Is it large N? Are you using fixed effects, including time fixed effects? Generally, this would be recommended. It appears you're not because you're including "treated" and "post" in the first stage, and these would both be eliminated if you're using unit and time period fixed effects. If I understand your identification strategy, it's that treated*post is the IV for X1, correct?

Concerning your question, you don't have to include treated*X2 and post*X2 in the first stage because you have enough to just identify the equation, and neither of these appears in the "structural" equation. But it could be more efficient to include treated*X2 and post*X2 (in both first stages, of course). If N is somewhat large, it makes sense to do this.

One piece of advice: Don't implement 2SLS "by hand." One reason for this is that it's easy to make a mistake, and your first stages mistakenly omit X2 from the first stage. Therefore, just specify X1 and X1*X2 as endogenous variables and list the the same set of IVs for both. Anything exogenous in the structural equation will automatically (and correctly) appear in the first stages.

With fixed effects, I would do this:

Code:

xtset id year xtivregress y x2 i.year (c.x1 c.x1#c.x2 = c.post#c.treat c.post#c.x2 c.treat#c.x2 c.post#c.treat#c.x2) , fe vce(cluster id)
Comment
Yanrong Jia

Join Date: Nov 2023

Posts: 3
#5

26 Nov 2023, 17:45

Originally posted by Jeff Wooldridge View Post

I think I can help, but I need to know more about the data and estimation. Is it large N? Are you using fixed effects, including time fixed effects? Generally, this would be recommended. It appears you're not because you're including "treated" and "post" in the first stage, and these would both be eliminated if you're using unit and time period fixed effects. If I understand your identification strategy, it's that treated*post is the IV for X1, correct?

Concerning your question, you don't have to include treated*X2 and post*X2 in the first stage because you have enough to just identify the equation, and neither of these appears in the "structural" equation. But it could be more efficient to include treated*X2 and post*X2 (in both first stages, of course). If N is somewhat large, it makes sense to do this.

One piece of advice: Don't implement 2SLS "by hand." One reason for this is that it's easy to make a mistake, and your first stages mistakenly omit X2 from the first stage. Therefore, just specify X1 and X1*X2 as endogenous variables and list the the same set of IVs for both. Anything exogenous in the structural equation will automatically (and correctly) appear in the first stages.

With fixed effects, I would do this:

Code:

xtset id year xtivregress y x2 i.year (c.x1 c.x1#c.x2 = c.post#c.treat c.post#c.x2 c.treat#c.x2 c.post#c.treat#c.x2) , fe vce(cluster id)

Yes, your understanding of my identification strategy is exactly right. Thank you so much for the detailed explanation.
Comment

Announcement

Nonlinear in endogenous variables and two-stage IV

Comment

Comment

Comment

Comment