Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coefficient changes sign and blows up after 2-stage residual inclusion/control funcion

    Hi,

    I have a cardinal dependent variable Y and a endogenous, binary, independent variable X and numerous control variables C.
    If I run a standard OLS model, I get a significant, positive coefficient of 0.02 for X.
    Now, I am implementing a 2-stage residual inclusion model where my first stage is a probit model X=(C, Z) with Z being my instrument.
    Since I want to use the residuals from this stage I estimated the model as follows:

    glm X C Z, fam(bin) link(probit)
    predict Xhat, response

    and then used the residuals in my original ols model:

    reg Y X C Z Xhat

    Now, I am getting a significant, negative coefficient for X of -4,25 and for the residual Xhat +4.49.

    Can someone explain this drastic change of the coefficient?

    Thanks a lot for your help.

  • #2
    You'll increase your chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex.

    If you have an endogenous binary variable but are treating the outcome as continuous, 2sls is generally consistent. If you want to, you could use ivregress, ivreg2, cmp, gsem, or eregress to model this more directly. I'm not sure about control function approaches, but in many such estimators, you need something in the equation for the endogenous variable that doesn't appear in the outcome equation. I suspect the second equation is only identified by the non-linearity in the glm which is not a great way to do identification.

    I looked at this with this code:
    Code:
    clear
    set obs 100
    g c=rnormal()
    g z=rnormal()
    g x=(rnormal() + c + z)>0
    g y=x + c + z + rnormal()
    glm x c z, fam(bin) link(probit)
    predict Xhat, response
    reg y x c z Xhat
    reg y x c z

    Comment

    Working...
    X