Including a quadratic term in an IV regression with one instrument

Rune Schmidt Qvist

Join Date: May 2022

Posts: 8
#1

Including a quadratic term in an IV regression with one instrument

30 Oct 2023, 08:21

Hi,

I'm running an IV regression with a dummy instrument and a continuous endogenous variable. I'd like to test for non-linearity of my data by including the quadratic term of my endogenous variable. I'm aware of Wooldridge (2000) and Wooldridge (2015) as well as several threads on this forum, and my understanding according to these is that the following three approaches are all valid:

Code:

regress x z other_controls, vce(cluster clustervar) predict xhat, resid gen xhat2 = xhat^2 *1: ivregress 2sls y (x x2 = z xhat2) other_controls, vce(cluster clustervar) *2: ivregress 2sls y (x x2 = xhat xhat2) other_controls, vce(cluster clustervar) *3 (this won't work for me, as my instrument is a dummy, so z2 = z): gen z2 = z^2 ivregress 2sls y (x x2 = z z2) other_controls, vce(cluster clustervar)

As I wrote above, it's my understanding that all of these methods are valid. If that's the case, then how come I get different results when I use 1 than when I use 2? Am I missing something?

Best,
Rune
Tags: IV, ivregress, regression
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2190
#2

30 Oct 2023, 12:43

To say they're all "valid" is different from saying they're the "same." As you pointed out, you can't even do (3). And if you didn't have other controls that help to predict x, you can't do (1) or (2), either. More precisely, you can do them but if, in the population, the first stage for x depends only on z then you get perfect collinearity in the limit.

Assuming x depends on more than just z in the first stage, I would tend to prefer (2). These are the optimal IVs if (i) The structural equation is homoskedastici; (ii) E(x|z,controls) is linear; (iii) Var(x|z,controls) is homoskedastic. None of these are needed for consistency, but they are reasonable "working" assumptions.

If the results are a lot different, I'd look at the first stage and see what else correlates with x other than z. If it's weak, the IV strategy is suspect.

I think it's very worthwhile to try the control function approach because it doesn't care what else affects x in the first stage. You can make it flexible.

Code:

reg x z other_controls predict vhat, resid reg y x x2 vhat c.vhat#c.vhat other_controls, vce(r) test vhat c.vhat#c.vhat

If you use the CF estimates, you need to adjust the standard errors for the vhat estimation.
1 like
Comment
Rune Schmidt Qvist

Join Date: May 2022

Posts: 8
#3

01 Nov 2023, 04:25

Originally posted by Jeff Wooldridge View Post

To say they're all "valid" is different from saying they're the "same." As you pointed out, you can't even do (3). And if you didn't have other controls that help to predict x, you can't do (1) or (2), either. More precisely, you can do them but if, in the population, the first stage for x depends only on z then you get perfect collinearity in the limit.

Assuming x depends on more than just z in the first stage, I would tend to prefer (2). These are the optimal IVs if (i) The structural equation is homoskedastici; (ii) E(x|z,controls) is linear; (iii) Var(x|z,controls) is homoskedastic. None of these are needed for consistency, but they are reasonable "working" assumptions.

If the results are a lot different, I'd look at the first stage and see what else correlates with x other than z. If it's weak, the IV strategy is suspect.

I think it's very worthwhile to try the control function approach because it doesn't care what else affects x in the first stage. You can make it flexible.

Code:

reg x z other_controls predict vhat, resid reg y x x2 vhat c.vhat#c.vhat other_controls, vce(r) test vhat c.vhat#c.vhat

If you use the CF estimates, you need to adjust the standard errors for the vhat estimation.

Dear Professor Wooldridge,

Thank you for your reply.

I think it's likely that the IV strategy is suspect, as my only controls are fixed effects. So I should probably use the control function. The point estimates of the CF regression are also more in line with expected functional form

Unfortunately, I don't know how to adjust the standard errors correctly. My original post doesn't reflect this, but I'm actually using two-way clustered standard errors. I can't figure out how to manually adjust the standard errors to reflect that.

Lastly, what's the the goal of the test vhat c.vhat#c.vhat? To test whether the instrument is strong in this functional form?

Sincerely, Rune
Comment

Announcement

Including a quadratic term in an IV regression with one instrument

Comment

Comment