2SLS with a binary endogenous variable

Alex Carr

Join Date: May 2017

Posts: 38
#1

2SLS with a binary endogenous variable

29 Aug 2018, 13:42

Hello, I'm in interested in examining the effect of an endogenous dummy variable, D, that on the dependent variable Y. i.e., D has a potential self-selected issue.

If I run:

Y_i =α_i +βD_i+Xλ +e_i

β is apparently biased and inconsistent so what I'd like to do is run a two-stage probit model, where in the first stage I use a criterion function:

D^*_i=Zσ +e_i (1)

where Z a vector of exogenous variables that include at least one instrument variable.

Then I plugged the fitted value of D and the inverse mill ratio variable into the structural model:

Y_i =α_i +β'Dhat_i +Xλ +τ INVERSEMILL + e_i(2)

if τ is significant, there is self-selection issue, then I report coefficient for β'
if τ is not significant, there is no self-selection issue, then using OLS and report coefficient for β is fine.

However, I've been told that this 2SLS with a binary endogenous variable is called Forbidden regression and yields biased and inconsistent estimates of β'. is this model wrong? or what's the solution to an endogenous binary variable.

Thank you very much.
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

29 Aug 2018, 14:19

Hi Alex,

Check out the help for
treatreg

which in Stata 15 is called
etregress

This pre-programmed command does what you want to do.
Comment
Alex Carr

Join Date: May 2017

Posts: 38
#3

29 Aug 2018, 15:24

Originally posted by Joro Kolev View Post

Hi Alex,

Check out the help for
treatreg

which in Stata 15 is called
etregress

This pre-programmed command does what you want to do.

Thanks. But I was concerned with the consistency and bias with this estimator, not how I will do it
Comment
Mark Schaffer

Join Date: Mar 2014

Posts: 324
#4

30 Aug 2018, 07:21

Jeff Wooldridge's panel data book discusses this. It also comes up on Statalist from time to time.

Short version: basic IV is fine (consistent umder std assumptions) using your instruments and other variables as is, but Jeff outlines a procedure that is more efficient (via constructing transformed instruments and then doing basic IV). If you look around the Statalist archives you should find the discussion.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

30 Aug 2018, 10:19

Alex,

1. If you want to know whether you have an endogeneity issue, or can just do OLS, do treatreg, then OLS and then do a Hausman test. This will tell you whether you have endogeneity issue or not, and is all preprogrammed so it will be hard to get it wrong.

2. If you want to address endogeneity, but for any reason do not want to use treatreg, as Mark said there are two simple strategies that work:
a) Standard 2SLS (ivregress) where you disregard the fact that your endogenous variable is binary.
b) First stage probit, but then you do not plug in the predicted values, but use the predicted values from the Probit as instruments in ivregress.

3. What you re describing sounds like a Control Function procedure which Wooldridge describes somewhere, but you are doing two things wrong/different from Wooldridge description, and therefore the properties of your procedure are not known.
a) In control function you do not plug in the predicted values from the first stage. You just keep the endogenous Di as it is, so in your eq.(2) you should not have Dhat, but D.
b) In your eq.(2) you should not have Inverse Mills ratio, but rather something called Generalised Residual/Error, which has two terms both functions of the Inverse Mills ratio.
Comment
Alex Carr

Join Date: May 2017

Posts: 38
#6

30 Aug 2018, 10:54

Thank you Joro, I have two questions regarding your generous reply.

Originally posted by Joro Kolev View Post

b) First stage probit, but then you do not plug in the predicted values, but use the predicted values from the Probit as instruments in ivregress.

I'm not sure what you mean by use the predicted values from the probit as instruments. You mean in the second stage, I keep endogenous Di and use Dihat as instrument?

Originally posted by Joro Kolev View Post

b) In your eq.(2) you should not have Inverse Mills ratio, but rather something called Generalised Residual/Error, which has two terms both functions of the Inverse Mills ratio

Are Generalised Residual/Error also called selectivity variables? and How do I calculate them? Thank you.
Comment
Alex Carr

Join Date: May 2017

Posts: 38
#7

30 Aug 2018, 11:26

Thank you guys for the reply. I want to be more explicit for my case.

I want to examine the effect of a state program (dummy variable D) on the unemployment rate (Y) . My observation unit is city i.

for OLS, I'd run :
Y_i =α_i +βD_i+Xλ +e_i

However, this program is generally located in cities that tend to have a lower unemployment rate. so D is not random and we have a selectivity bias.

Now I can of course do this traditional IV approach by finding an instrument.

What I also want to do for another approach is that

(1) do a probit model first to predict the likelihood the city i is being selected to have the program D.

D^*_i=Zσ +e_i (1)

(2) do an OLS with selectivity variable included in the second stage

Y_i =α_i +β'D +Xλ +τ Selectivity + e_i(2)

and my questions are that

(1) is β' consistent and unbiased.

(2) is this approach called Heckman correction approach or control function approach?

(3) I know Heckman approach tend to requires variables for D=0 are not observable. But in my case, variables for D=0 are observable. Can I still use this approach?

(4) Lastly, is the Selectivity variable different than Inverse mill ratio? or is the Selectivity variable just another name for Generalized Residual from a control function approach.

Thank you.
Comment

Announcement

2SLS with a binary endogenous variable

Comment

Comment

Comment

Comment

Comment

Comment