Instrumental variables with binary endogenous regressor

Maria Franco started a topic Instrumental variables with binary endogenous regressor

26 Jun 2017, 06:06
Instrumental variables with binary endogenous regressor

Hi Stata listers,

I am estimating the following model using Instrumental Variables:
Y = B₀ + B₁D + B₂X + U, where D is an endogenous dummy variable.

In order to avoid the forbidden regression, I'm following Wooldridge (2002):
1. Estimate D = A₀ + A₁Z + A₂X + V using a probit model, and calculate the fitted value, D_hat.
2. Estimate the main equation by IV using D_hat as instrument.

My question is: should I include in step 2 the instrument Z or should I only use D_hat as instrument (together with X as instrument of itself)?

Many thanks for your help.

Maria
Tags: None
Michael Zuze replied

10 Jun 2024, 03:05
thanks Joao Santos Silva so i have tried it as below. Initially i wanted to use biprobit but the system was not converging probably because i specified as there is reverse causality between poor and informal employment thus each is either a dependent or independent variable in one equation.

global y1 hhinformal // Informal sector employment, binary
global y2 poor // Poverty status, binary
global x1 hdmale i.hhage i.hheduc hhmarried hdsize tot_informal urban // Shared predictors
global z1 child_under6 m_hseduc // Instruments for equation 1 (hhinformal)
global z2 large_firm // Instrument for equation 2 (poor)

// First stage regression: predicting hhinformal
ivregress 2sls $y1 ($y2 = $z2) $x1 $z1, first

// First stage regression: predicting poor
ivregress 2sls $y2 ($y1 = $z1) $x1 $z2, first
Leave a comment:
Joao Santos Silva replied

10 Jun 2024, 00:11
Dear Michael Zuze,

Maybe I am missing something, but I would say that the standard in this case is to simply use 2SLS and ignore the fact that the dependent variables are binary. Of course, you need to interpret the results with the necessary caution.

Best wishes,

Joao
Leave a comment:
Michael Zuze replied

09 Jun 2024, 07:34
Joao Santos Silva would you please assist

I have a similar case and trying to avoid the Forbidden regression. I have two simultaneous equations one for poverty and the other for informal employment specified as follows:
Poor = B₀ + B₁ Informal employment + B₂X +B₂Z1+ U
Informal Employment = B₀ + B₁ Poor + B₂X +B₂Z2+ U, where both dependent and endogenous variables are binary for the two equations and vector X has the same exogenous variables, Z1 and Z2 are instruments.

I was following Maddala(1983), who suggested estimating probit ML in the first and second stages; however, after reading Angrist, I discovered this is impossible and leads to forbidden regression. Instead, I should use LPM. Kindly assist me in working this out for my two simultaneous equations.

Thanks
Leave a comment:
Michael Zuze replied

09 Jun 2024, 06:03
Originally posted by Jeff Wooldridge View Post

Devon: There are no guarantees with heteroskedasticity. It could still be more efficient. Use robust standard errors in both cases. With binary Z you’re counting on variation in X to strengthen the IV.

may you please assist in a situation where both dependent and endogenous variables (Y&D) are binary in a simultaneous equation model and there is reverse causality

Y = B₀ + B₁D + B₂X + U
D = B₀ + B₁Y+ B₂X + U
I was reading a method suggested by Maddala (1983) that the two stages can be done using probit ML?.
Leave a comment:
Thaer Alhalabi replied

27 Dec 2023, 10:10
Hello Jef, I have a question that is related to this thread, is it okay if my ivreg2 shows that my instruments are strong but then when I do the first step -probit- only one instrument is significant (out of three). Can I still use the estimated probability as an instrument or does it mean that the instruments are questionable?
Thanks
Leave a comment:
Mohammed Omran replied

30 Oct 2022, 17:14
You may want to check the following from Angrist & Krueger (2001):

We conclude our review of pitfalls with a discussion of functional form issues for both the first and second stages in two-stage least squares estimation. Researchers are sometimes tempted to use probit or logit to generate first-stage predicted values in applications with a dummy endogenous regressor. But this is not necessary and may even do some harm. In two-stage least squares, consistency of the second-stage estimates does not turn on getting the first-stage functional form right (Kelejian, 1971). So using a linear regression for the first-stage estimates generates consistent second-stage estimates even with a dummy endogenous variable. Moreover, using a nonlinear first stage to generate fitted values that are plugged directly into the second-stage equation does not generate consistent estimates unless the nonlinear model happens to be exactly right, a result which makes the dangers of misspecification high.¹⁰

Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85. https://doi.org/10.1257/jep.15.4.69
Leave a comment:
Mohammed Omran replied

30 Oct 2022, 17:12
You may want to check the following from Angrist & Krueger (2001):

We conclude our review of pitfalls with a discussion of functional form issues for both the first and second stages in two-stage least squares estimation. Researchers are sometimes tempted to use probit or logit to generate first-stage predicted values in applications with a dummy endogenous regressor. But this is not necessary and may even do some harm. In two-stage least squares, consistency of the second-stage estimates does not turn on getting the first-stage functional form right (Kelejian, 1971). So using a linear regression for the first-stage estimates generates consistent second-stage estimates even with a dummy endogenous variable. Moreover, using a nonlinear first stage to generate fitted values that are plugged directly into the second-stage equation does not generate consistent estimates unless the nonlinear model happens to be exactly right, a result which makes the dangers of misspecification high.¹⁰

Angrist, J. D., & Krueger, A. B. (2001). Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. Journal of Economic Perspectives, 15(4), 69–85. https://doi.org/10.1257/jep.15.4.69
Leave a comment:
Devon Smith replied

13 Jul 2022, 20:31
Got it! Thanks, Jeff! Appreciate your help.

Best, Devon.
Leave a comment:
Jeff Wooldridge replied

09 Jul 2022, 19:28
Devon: There are no guarantees with heteroskedasticity. It could still be more efficient. Use robust standard errors in both cases. With binary Z you’re counting on variation in X to strengthen the IV.
1 like
Leave a comment:
Devon Smith replied

06 Jul 2022, 20:53
Jeff:

Just had a follow-up question:

If the homoskedasticity assumption in the structural equation does not hold, does using fitted values as instrument lead to more efficient estimates than using the actual instrument, Z, itself? I have a situation where my instrument also happens to be binary.
Leave a comment:
Devon Smith replied

06 Jul 2022, 19:14
..

Last edited by Devon Smith; 06 Jul 2022, 19:19.
Leave a comment:
Devon Smith replied

03 Jul 2022, 13:14
Hi Jeff: Thanks for the link! This is exactly what I was looking for.
Leave a comment:
Jeff Wooldridge replied

03 Jul 2022, 01:43
Hi Devon: My former student, Ruonan Xu at Rutgers, has written on exactly this problem. She shows that using the probit fitted values can strengthen the IVs and explores the effective F statistic for determining weak IVs. You can use -weakivtest- after obtaining the probit fitted values. You can add Z as extra instruments but it might weaken the group as a whole.

https://www.sciencedirect.com/scienc...ZWpyxaZcaFZhvA
2 likes
Leave a comment:
Devon Smith replied

02 Jul 2022, 22:13
Hi Fei:

Thanks for your reply. I am using an interaction with the endogenous variable in my model: y=ax+ bx*y where x is binary and endogenous. I have an instrument for x, z. I am trying to do a first stage of the form x=c.z and obtain xhat. Then I plan on using xhat and xhat*y as instrument. I am using the ivregress 2sls command since I am using the svy-prefix and ivreg2 is not supported with svy. Moreover, I can't run estat first after ivregress 2sls as well since that is also not supported with svy. That's why I was wondering how to get the first stage F "manually."
Leave a comment:

Announcement

Instrumental variables with binary endogenous regressor

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: