Panel bootstrap after Wooldridge (1995) two-step estimator

Hanna Lindstrom

Join Date: Apr 2017

Posts: 25
#1

Panel bootstrap after Wooldridge (1995) two-step estimator

16 Sep 2019, 08:45

Dear Statalisters,

I have a panel data set which suffers from sample selection bias. I am following the approach of Wooldridge (1995) and Semykina and Wooldridge (2010). The approach applied in order to correct for sample selection bias is to:
1. Estimate T different probits of the selection equation in ordet to retrieve T inverse mills ratios

select=x1 z1

I do this using a loop:

HTML Code:

forvalues i = 2005(1)2016 { disp `i' probit select `x1' `z1' if yy==`i', vce(robust) predict xb`i' if yy==`i', xb qui replace IMR=normalden(xb`i')/normal(xb`i') if `y2'==1 & yy==`i' }

2. For all observations in the sample (i.e., for those where select=1), use pooled OLS to estimate the main equation:

reg food2 `x1' IMR i.yy if select==1, vce(cluster mc)

3. Estimate the asymptotic variance

Now, I am struggling with step 3. In their paper from 2010, Semykina and Wooldridge write on p. 378 "Instead of using analytical formulae for the asymptotic variance, one can apply "panel bootstrap". This involves resampling cross-sectional units (and all time periods for each unit sampled) and using the bootstrap sample to approximate the distribution of the parameter vector "

It is my impression that with a two-step estimator, one should bootstrap over both (in this case) the probit(s) and the second stage main regression. However, I have not found any clues on Statalist or elsewhere on how to bootstrap standard errors, accounting for two separate regressions, when the first step involves estimating T different probits. I assume it would be something similar in style to the "program" approach in https://www.statalist.org/forums/for...ction-on-stata , but taking into account the T different probits in the first stage.
I.e.,

HTML Code:

* Bootstrap SE program heck2, rclass forvalues i = 2005(1)2016 { disp `i' probit select `x1' `z1' if yy==`i', vce(robust) predict xb`i' if yy==`i', xb qui replace IMR=normalden(xb`i')/normal(xb`i') if select==1 & yy==`i' } xtset mc yy reg food2 `x1' IMR i.yy, vce(cluster mc) return scalar beta = _b[L1_goal] end bootstrap r(beta), reps(100) seed(1234) nodrop:heck2 estat bootstrap

Which renders the following error message:

HTML Code:

. bootstrap r(beta), reps(100) seed(1234) nodrop:heck2 (running heck2 on estimation sample) varlist required an error occurred when bootstrap executed heck2

Is there anyone that has used the "panel bootstrap" method that Semykina and Wooldridge (2010) refers to, and in this context? If so, I would be grateful to recieve any suggestions on how this could be implemented using Stata code.

I am aware that my question does not pertain necessarily to a particular Stata command, so I completely understand if this is the wrong forum for it.

/Hanna

References:
Wooldridge, Jeffrey M. "Selection corrections for panel data models under conditional mean independence assumptions." Journal of econometrics 68.1 (1995): 115-132.
Semykina, Anastasia, and Jeffrey M. Wooldridge. "Estimating panel data models in the presence of endogeneity and selection." Journal of Econometrics 157.2 (2010): 375-380.

Last edited by Hanna Lindstrom; 16 Sep 2019, 08:47.
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2188
#2

17 Sep 2019, 16:47

Hanna: I might be able to help. But Anastasia has Stata code on her website at Florida State.
Comment
Hanna Lindstrom

Join Date: Apr 2017

Posts: 25
#3

18 Sep 2019, 09:17

Prof Wooldridge, thank you very much for your reply.

Yes I have seen the code of Anastasia Semykina on her web page. It was very helpful, but mainly involved the solution using matrices if i remember it correctly.

However, I think I was able to solve the issues I had with implementing "panel bootstrapped" errors. I used the -program- command in Stata in order to bootstrap over probit equations and pooled OLS, and when running the -boostrap- command calling the program, I told Stata to perform bootstrap on cross-sectional units, and on all time periods for those units sampled. (see code below).
At least, this is the understanding I had of the solution based on https://www.stata.com/support/faqs/s...th-panel-data/

HTML Code:

tsset id yy program myprogram, rclass * STEP 1) Probit equations gen IMR=. forvalues i = 2005(1)2016 { disp `i' probit select `x1' `z1' if yy==`i', vce(robust) predict xb`i' if yy==`i', xb qui replace IMR=normalden(xb`i')/normal(xb`i') if report==1 & yy==`i' } * STEP 2) Pooled OLS with year dummies reg food2 `x1' IMR i.yy return scalar beta = _b[var1] drop xb* IMR exit end generate newid=id tsset newid yy bootstrap r(beta), reps(100) seed(1234) cluster(id) idcluster(newid) nodrop:myprogram estat bootstrap

If this is completely wrong, I would be very thankful for comments and suggestions.

Kind regards,
Hanna

Sources: https://www.stata.com/support/faqs/s...th-panel-data/
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2188
#4

19 Sep 2019, 13:34

Hanna: That looks roughly how my code looks when I've used the panel bootstrap. I haven't used it for this particular problem, but for similar problems with panel data. In particular, it should handle both the two-step estimation and allow for serial correlation and heteroskedasticity of unknown form.
Comment
Hanna Lindstrom

Join Date: Apr 2017

Posts: 25
#5

25 Sep 2019, 02:17

Prof Wooldridge,
Thank you very much indeed for your reply!

Kind regards,
Hanna
Comment

Announcement

Panel bootstrap after Wooldridge (1995) two-step estimator

Comment

Comment

Comment

Comment