Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Instrumental variables with binary endogenous regressor

    Hi Stata listers,

    I am estimating the following model using Instrumental Variables:
    Y = B0 + B1D + B2X + U, where D is an endogenous dummy variable.

    In order to avoid the forbidden regression, I'm following Wooldridge (2002):
    1. Estimate D = A0 + A1Z + A2X + V using a probit model, and calculate the fitted value, Dhat.
    2. Estimate the main equation by IV using Dhat as instrument.

    My question is: should I include in step 2 the instrument Z or should I only use Dhat as instrument (together with X as instrument of itself)?

    Many thanks for your help.

    Maria

  • Michael Zuze
    replied
    thanks Joao Santos Silva so i have tried it as below. Initially i wanted to use biprobit but the system was not converging probably because i specified as there is reverse causality between poor and informal employment thus each is either a dependent or independent variable in one equation.

    global y1 hhinformal // Informal sector employment, binary
    global y2 poor // Poverty status, binary
    global x1 hdmale i.hhage i.hheduc hhmarried hdsize tot_informal urban // Shared predictors
    global z1 child_under6 m_hseduc // Instruments for equation 1 (hhinformal)
    global z2 large_firm // Instrument for equation 2 (poor)

    // First stage regression: predicting hhinformal
    ivregress 2sls $y1 ($y2 = $z2) $x1 $z1, first

    // First stage regression: predicting poor
    ivregress 2sls $y2 ($y1 = $z1) $x1 $z2, first

    Leave a comment:


  • Joao Santos Silva
    replied
    Dear Michael Zuze,

    Maybe I am missing something, but I would say that the standard in this case is to simply use 2SLS and ignore the fact that the dependent variables are binary. Of course, you need to interpret the results with the necessary caution.

    Best wishes,

    Joao

    Leave a comment:


  • Michael Zuze
    replied
    Joao Santos Silva would you please assist

    I have a similar case and trying to avoid the Forbidden regression. I have two simultaneous equations one for poverty and the other for informal employment specified as follows:
    Poor = B0 + B1 Informal employment + B2X +B2Z1+ U
    Informal Employment = B0 + B1 Poor + B2X +B2Z2+ U, where both dependent and endogenous variables are binary for the two equations and vector X has the same exogenous variables, Z1 and Z2 are instruments.

    I was following Maddala(1983), who suggested estimating probit ML in the first and second stages; however, after reading Angrist, I discovered this is impossible and leads to forbidden regression. Instead, I should use LPM. Kindly assist me in working this out for my two simultaneous equations.

    Thanks

    Leave a comment:


  • Michael Zuze
    replied
    Originally posted by Jeff Wooldridge View Post
    Devon: There are no guarantees with heteroskedasticity. It could still be more efficient. Use robust standard errors in both cases. With binary Z you’re counting on variation in X to strengthen the IV.
    may you please assist in a situation where both dependent and endogenous variables (Y&D) are binary in a simultaneous equation model and there is reverse causality

    Y = B0 + B1D + B2X + U
    D = B0 + B1Y+ B2X + U
    I was reading a method suggested by Maddala (1983) that the two stages can be done using probit ML?.

    Leave a comment:


  • Thaer Alhalabi
    replied
    Hello Jef, I have a question that is related to this thread, is it okay if my ivreg2 shows that my instruments are strong but then when I do the first step -probit- only one instrument is significant (out of three). Can I still use the estimated probability as an instrument or does it mean that the instruments are questionable?
    Thanks

    Leave a comment:


  • Mohammed Omran
    replied
    You may want to check the following from Angrist & Krueger (2001):

    We conclude our review of pitfalls with a discussion of functional form issues for both the first and second stages in two-stage least squares estimation. Researchers are sometimes tempted to use probit or logit to generate first-stage predicted values in applications with a dummy endogenous regressor. But this is not necessary and may even do some harm. In two-stage least squares, consistency of the second-stage estimates does not turn on getting the first-stage functional form right (Kelejian, 1971). So using a linear regression for the first-stage estimates generates consistent second-stage estimates even with a dummy endogenous variable. Moreover, using a nonlinear first stage to generate fitted values that are plugged directly into the second-stage equation does not generate consistent estimates unless the nonlinear model happens to be exactly right, a result which makes the dangers of misspecification high.10



    Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85. https://doi.org/10.1257/jep.15.4.69

    Leave a comment:


  • Mohammed Omran
    replied
    You may want to check the following from Angrist & Krueger (2001):

    We conclude our review of pitfalls with a discussion of functional form issues for both the first and second stages in two-stage least squares estimation. Researchers are sometimes tempted to use probit or logit to generate first-stage predicted values in applications with a dummy endogenous regressor. But this is not necessary and may even do some harm. In two-stage least squares, consistency of the second-stage estimates does not turn on getting the first-stage functional form right (Kelejian, 1971). So using a linear regression for the first-stage estimates generates consistent second-stage estimates even with a dummy endogenous variable. Moreover, using a nonlinear first stage to generate fitted values that are plugged directly into the second-stage equation does not generate consistent estimates unless the nonlinear model happens to be exactly right, a result which makes the dangers of misspecification high.10



    Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85. https://doi.org/10.1257/jep.15.4.69

    Leave a comment:


  • Devon Smith
    replied
    Got it! Thanks, Jeff! Appreciate your help.

    Best, Devon.

    Leave a comment:


  • Jeff Wooldridge
    replied
    Devon: There are no guarantees with heteroskedasticity. It could still be more efficient. Use robust standard errors in both cases. With binary Z you’re counting on variation in X to strengthen the IV.

    Leave a comment:


  • Devon Smith
    replied
    Jeff:

    Just had a follow-up question:

    If the homoskedasticity assumption in the structural equation does not hold, does using fitted values as instrument lead to more efficient estimates than using the actual instrument, Z, itself? I have a situation where my instrument also happens to be binary.

    Leave a comment:


  • Devon Smith
    replied
    ..
    Last edited by Devon Smith; 06 Jul 2022, 19:19.

    Leave a comment:


  • Devon Smith
    replied
    Hi Jeff: Thanks for the link! This is exactly what I was looking for.

    Leave a comment:


  • Jeff Wooldridge
    replied
    Hi Devon: My former student, Ruonan Xu at Rutgers, has written on exactly this problem. She shows that using the probit fitted values can strengthen the IVs and explores the effective F statistic for determining weak IVs. You can use -weakivtest- after obtaining the probit fitted values. You can add Z as extra instruments but it might weaken the group as a whole.

    https://www.sciencedirect.com/scienc...ZWpyxaZcaFZhvA

    Leave a comment:


  • Devon Smith
    replied
    Hi Fei:

    Thanks for your reply. I am using an interaction with the endogenous variable in my model: y=ax+ bx*y where x is binary and endogenous. I have an instrument for x, z. I am trying to do a first stage of the form x=c.z and obtain xhat. Then I plan on using xhat and xhat*y as instrument. I am using the ivregress 2sls command since I am using the svy-prefix and ivreg2 is not supported with svy. Moreover, I can't run estat first after ivregress 2sls as well since that is also not supported with svy. That's why I was wondering how to get the first stage F "manually."

    Leave a comment:

Working...
X