Instrumental variables with binary endogenous regressor

Maria Franco

Join Date: Jul 2014

Posts: 6
#1

Instrumental variables with binary endogenous regressor

26 Jun 2017, 06:06

Hi Stata listers,

I am estimating the following model using Instrumental Variables:
Y = B₀ + B₁D + B₂X + U, where D is an endogenous dummy variable.

In order to avoid the forbidden regression, I'm following Wooldridge (2002):
1. Estimate D = A₀ + A₁Z + A₂X + V using a probit model, and calculate the fitted value, D_hat.
2. Estimate the main equation by IV using D_hat as instrument.

My question is: should I include in step 2 the instrument Z or should I only use D_hat as instrument (together with X as instrument of itself)?

Many thanks for your help.

Maria
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

26 Jun 2017, 08:17

I gather the biprobit model would fit your needs.

Best regards,

Marcos
Comment
Peter Perera

Join Date: Jun 2017

Posts: 14
#3

15 Jul 2017, 12:08

Hi Maria,
See pages 142-144 of Mostly Harmless Econometrics by Angrist and Pischke. I believe you need to only include only D_hat and X.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#4

15 Jul 2017, 12:36

What you need is that Dhat not be correlated with the residual is the original model: Y = B₀ + B₁D + B₂X + U,
A probit model in the first stage will not give you that because it is nonlinear.
Just run a 2SLS regression using Z as an instrument for D: ivregress 2sls Y (D = Z) X
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#5

15 Jul 2017, 14:49

Dear Maria,

Just to add to the variety of (late) advice you are getting, I would suggest using both Z and D_hat as instruments. That is what we did in (see the top of page 291):

Windmeijer, F. and Santos Silva, J.M.C. (1997), Estimation of Count Data Models with Endogenous Regressors; An Application to Demand for Health Care, Journal of Applied Econometrics, 12(3), pp. 281-294.

Best wishes,

Joao
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#6

16 Jul 2017, 10:12

@João: Why would using an LPM in the first stage be wrong? And since the Dhat from a probit model is non-linear in X and Z, is there not a risk that it still be linearly correlated with U as defined in the equation in #1 above?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#7

16 Jul 2017, 11:13

Dear Eric de Souza,

A LPM in the first stage is not wrong and that is actually what should be done.

The question is what instrument to use in the first stage. We can just use Z, but that is unlikely to be a good instrument if Z is continuous and D binary. An alternative is to replace Z with E(D|Z,X), which is likely a better instrument; this can be approximated using a logit or probit. What we did in the 1997 paper was actually to use both Z and an estimate of E(D|Z,X) obtained with a logit.

So, the command I would suggest is something like (which has a LPM in the first stage):

ivregress 2sls Y (D = Z D_hat) X

Does is make sense?

Best wishes,

Joao
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#8

16 Jul 2017, 15:46

Provided E(U|X,Z) = 0, using either Z, Dhat, or both produces consistent estimators. Joao's proposal has the advantage of being asymptotically no worse than either of the other two approaches if Var(U|X,Z) is constant. It also produces an overidentifying restriction, although I'm not sure how valuable that is as a specification test in this context.

A potential downside is that using (Z,Dhat) is more likely to lead to a weak instruments problem. If the probit model for D is correct then Z is redundant as an IV. Therefore, the rule for having a strong enough instrument -- roughly, the first-stage F statistic is above 10 -- would be harder to achieve.
1 like
Comment
Mat Log

Join Date: Jan 2018

Posts: 10
#9

20 Sep 2019, 12:39

Dear all,

Do we get the proper standard errors when running the following command that you suggested:

ivregress 2sls Y (D = Z D_hat) X

Thanks a lot,

Mat
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#10

20 Sep 2019, 14:57

Mat: Yes, the standard errors are valid provided the error satisfies the zero conditional mean restriction in my earlier post (which is essentially needed for consistency, anyway). I show this in Chapter 6 of my MIT Press book. You should make them robust to heteroskedasticity, as usual.
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#11

02 Jul 2022, 15:35

Hi Jeff and Joao:

In the above example, which first stage F-stat should be reported? D = A₀ + A₁Z + A₂X + V or D= AZ+ D-hat?
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#12

02 Jul 2022, 20:31

Originally posted by Devon Smith View Post

Hi Jeff and Joao:

In the above example, which first stage F-stat should be reported? D = A₀ + A₁Z + A₂X + V or D= AZ+ D-hat?

Devon, if you only use Z as the instrument, then the first-stage regression is as below and the F stat is only related to Z.

Code:

regress D Z X

If you use both Z and Dhat as instruments, then the first-stage regression would be as below and the F stat is related to the joint significance of Z and Dhat.

Code:

regress D Z Dhat X

ADD: But you should't "manually" run the first stage. Use -ivregress- and correctly specify the set of instruments there. After -ivregress-, run -estat firststage- (for homoskedasticity) or -weakivtest- (for heteroskedasticity, from SSC) to obtain the correct F stat.

Last edited by Fei Wang; 02 Jul 2022, 20:45.
1 like
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#13

02 Jul 2022, 22:13

Hi Fei:

Thanks for your reply. I am using an interaction with the endogenous variable in my model: y=ax+ bx*y where x is binary and endogenous. I have an instrument for x, z. I am trying to do a first stage of the form x=c.z and obtain xhat. Then I plan on using xhat and xhat*y as instrument. I am using the ivregress 2sls command since I am using the svy-prefix and ivreg2 is not supported with svy. Moreover, I can't run estat first after ivregress 2sls as well since that is also not supported with svy. That's why I was wondering how to get the first stage F "manually."
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#14

03 Jul 2022, 01:43

Hi Devon: My former student, Ruonan Xu at Rutgers, has written on exactly this problem. She shows that using the probit fitted values can strengthen the IVs and explores the effective F statistic for determining weak IVs. You can use -weakivtest- after obtaining the probit fitted values. You can add Z as extra instruments but it might weaken the group as a whole.

https://www.sciencedirect.com/scienc...ZWpyxaZcaFZhvA
2 likes
Comment
Devon Smith

Join Date: Sep 2021

Posts: 26
#15

03 Jul 2022, 13:14

Hi Jeff: Thanks for the link! This is exactly what I was looking for.
Comment

Announcement

Instrumental variables with binary endogenous regressor

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment