Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Including a quadratic term in an IV regression with one instrument

    Hi,

    I'm running an IV regression with a dummy instrument and a continuous endogenous variable. I'd like to test for non-linearity of my data by including the quadratic term of my endogenous variable. I'm aware of Wooldridge (2000) and Wooldridge (2015) as well as several threads on this forum, and my understanding according to these is that the following three approaches are all valid:

    Code:
    regress x z other_controls, vce(cluster clustervar)
    
    predict xhat, resid
    
    gen xhat2 = xhat^2
    
    *1:
    ivregress 2sls y (x x2 = z xhat2) other_controls, vce(cluster clustervar)
    
    *2:
    ivregress 2sls y (x x2 = xhat xhat2) other_controls, vce(cluster clustervar)
    
    *3 (this won't work for me, as my instrument is a dummy, so z2 = z):
    gen z2 = z^2
    ivregress 2sls y (x x2 = z z2) other_controls, vce(cluster clustervar)
    As I wrote above, it's my understanding that all of these methods are valid. If that's the case, then how come I get different results when I use 1 than when I use 2? Am I missing something?

    Best,
    Rune

  • #2
    To say they're all "valid" is different from saying they're the "same." As you pointed out, you can't even do (3). And if you didn't have other controls that help to predict x, you can't do (1) or (2), either. More precisely, you can do them but if, in the population, the first stage for x depends only on z then you get perfect collinearity in the limit.

    Assuming x depends on more than just z in the first stage, I would tend to prefer (2). These are the optimal IVs if (i) The structural equation is homoskedastici; (ii) E(x|z,controls) is linear; (iii) Var(x|z,controls) is homoskedastic. None of these are needed for consistency, but they are reasonable "working" assumptions.

    If the results are a lot different, I'd look at the first stage and see what else correlates with x other than z. If it's weak, the IV strategy is suspect.

    I think it's very worthwhile to try the control function approach because it doesn't care what else affects x in the first stage. You can make it flexible.

    Code:
    reg x z other_controls
    predict vhat, resid
    reg y x x2 vhat c.vhat#c.vhat other_controls, vce(r)
    test vhat c.vhat#c.vhat
    If you use the CF estimates, you need to adjust the standard errors for the vhat estimation.

    Comment


    • #3
      Originally posted by Jeff Wooldridge View Post
      To say they're all "valid" is different from saying they're the "same." As you pointed out, you can't even do (3). And if you didn't have other controls that help to predict x, you can't do (1) or (2), either. More precisely, you can do them but if, in the population, the first stage for x depends only on z then you get perfect collinearity in the limit.

      Assuming x depends on more than just z in the first stage, I would tend to prefer (2). These are the optimal IVs if (i) The structural equation is homoskedastici; (ii) E(x|z,controls) is linear; (iii) Var(x|z,controls) is homoskedastic. None of these are needed for consistency, but they are reasonable "working" assumptions.

      If the results are a lot different, I'd look at the first stage and see what else correlates with x other than z. If it's weak, the IV strategy is suspect.

      I think it's very worthwhile to try the control function approach because it doesn't care what else affects x in the first stage. You can make it flexible.

      Code:
      reg x z other_controls
      predict vhat, resid
      reg y x x2 vhat c.vhat#c.vhat other_controls, vce(r)
      test vhat c.vhat#c.vhat
      If you use the CF estimates, you need to adjust the standard errors for the vhat estimation.
      Dear Professor Wooldridge,

      Thank you for your reply.

      I think it's likely that the IV strategy is suspect, as my only controls are fixed effects. So I should probably use the control function. The point estimates of the CF regression are also more in line with expected functional form

      Unfortunately, I don't know how to adjust the standard errors correctly. My original post doesn't reflect this, but I'm actually using two-way clustered standard errors. I can't figure out how to manually adjust the standard errors to reflect that.

      Lastly, what's the the goal of the test vhat c.vhat#c.vhat? To test whether the instrument is strong in this functional form?

      Sincerely, Rune

      Comment

      Working...
      X