Instrumental Variable Probit using Panel Data

Vera Mihailovna

Join Date: Oct 2020

Posts: 17
#1

Instrumental Variable Probit using Panel Data

26 Nov 2020, 14:00

I am trying to implement, IV regression in which the second stage is probit. I use correlated random effect approach suggested by @Jeff Wooldridge.
This is the code that I am using

Code:

by pid, sort : egen x1_bar = mean(x1) by pid, sort : egen x2_bar = mean(x2) by pid, sort : egen iv_bar = mean(iv) program drop my2sls program my2sls reg x1 x2 iv x1_bar x2_bar iv_bar i.time predict x1_hat, xb probit y x1_hat x2 x1_bar x2_bar i.date, cluster(pid) drop follow_hat end bootstrap, reps(100): my2sls

Everything is correct?
Any other suggestions?
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#2

26 Nov 2020, 19:30

You need to use a control function approach combined with CRE. Get the residuals from the first regression, not fitted values. Add resids to second stage with x1 and the other variables, not x1hat.

Is x1 roughly continuous?
1 like
Comment
Vera Mihailovna

Join Date: Oct 2020

Posts: 17
#3

27 Nov 2020, 05:30

Dear Professor Wooldridge,
Thank you very much for your reply! Yes it is continuous, but my IV takes the maximum value of 4.

Code:

by pid, sort : egen x1_bar = mean(x1) by pid, sort : egen x2_bar = mean(x2) by pid, sort : egen iv_bar = mean(iv) program drop my2sls program my2sls reg x1 x2 iv x1_bar x2_bar iv_bar i.time predict x1_resid, resid probit y x1 x1_resid x2 x1_bar x2_bar i.date, cluster(pid) drop x1_resid end bootstrap, reps(100): my2sls

Is it correct now? Or do I also need to define a mean value for x1_resid?

Follow up question: How can I use Stock-Yogo recommendations to test the strength of the instrument?

Can I just simply run ivreg2 and omit the output?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#4

27 Nov 2020, 15:16

Very good intuition! In Lin and Wooldridge (2019), we show that you should include the time average of the instruments in both stages. You can find the unpublished version on Wei's website. In your setting, this is numerically identical to including the time average of x1_resid, but without the problem of a generated regressor. So use

Code:

probit y x1 x1_resid x2 x1_bar x2_bar iv_bar i.date, cluster(pid)

You can actually use the cluster-robust t statistic on x1_resid to test the null that x1 is exogenous with respect to the idiosyncratic shocks.

We don't have good theory for determining strong vs weak IVs in nonlinear contexts. What you propose is what I suggest: act as if the model of interest is also linear and use the diagnostics for linear models. You might look at the user-written command weakivtest in addition to ivreg2.

I just noticed something about your bootstrap. Using cluster(pid) does nothing because your bootstrap standard errors will be robust to clustering as well as the two-step estimation. But now I don't see how you have told Stata you want to use the panel bootstrap. I should have some old code that does that.

JW
1 like
Comment
Vera Mihailovna

Join Date: Oct 2020

Posts: 17
#5

27 Nov 2020, 18:07

Professor Wooldridge, thank you very much! This information is gold. Your old code that does that is publicly available somewhere? Or could I send you an email for the code if it is at your fingertips?
Best regards,
Vera
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#6

27 Nov 2020, 18:43

Email is fine. I should be able to find it pretty easily.
1 like
Comment
Ke Ju

Join Date: Jul 2021

Posts: 5
#7

02 Jul 2021, 06:14

Originally posted by Jeff Wooldridge View Post

Very good intuition! In Lin and Wooldridge (2019), we show that you should include the time average of the instruments in both stages. You can find the unpublished version on Wei's website. In your setting, this is numerically identical to including the time average of x1_resid, but without the problem of a generated regressor. So use

Code:

probit y x1 x1_resid x2 x1_bar x2_bar iv_bar i.date, cluster(pid)

You can actually use the cluster-robust t statistic on x1_resid to test the null that x1 is exogenous with respect to the idiosyncratic shocks.

We don't have good theory for determining strong vs weak IVs in nonlinear contexts. What you propose is what I suggest: act as if the model of interest is also linear and use the diagnostics for linear models. You might look at the user-written command weakivtest in addition to ivreg2.

I just noticed something about your bootstrap. Using cluster(pid) does nothing because your bootstrap standard errors will be robust to clustering as well as the two-step estimation. But now I don't see how you have told Stata you want to use the panel bootstrap. I should have some old code that does that.

JW

Dear Professor Wooldridge and dear Mihailovna, thanks for your valuable discussions. I would also like to ask how to tell Stata that we want to use the panel bootstrap, any suggestions?

Also, my rough understanding is that the result we get here from bootstrap is BSE (bootstrapped standard errors), an approximate estimate of ACSE (asymptotically correct standard errors), so do we have a way to compute the ACSE?

Best wishes,

Ke
Comment

Announcement

Instrumental Variable Probit using Panel Data

Comment

Comment

Comment

Comment

Comment

Comment