Reproduction from Papke and Wooldrige (2008): issues with bootstrap (insufficient observations)

Yoann Morin

Join Date: Jan 2019
Posts: 10

Reproduction from Papke and Wooldrige (2008): issues with bootstrap (insufficient observations)

10 Jul 2019, 11:56

I'm trying to estimate a fractionnal regression model with unobserved heterogenity.
I choose the method developped by papke and Wooldrige (2009) "Panel data methods for fractional response variables with an application to test pass rates".

Before applying the code to my data, I tried to replicate their results using the code provided on papke's website: http://econ.msu.edu/faculty/papke/Pa...statafiles.zip

The code I'm interresting in is the panel case with exogoneity of covariates ("math4_boot_exog_gee.do").
Here is a copy of the code from the website:

Code:

capture program drop math4_boot

program math4_boot, rclass

glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
mat b = e(b)
xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip)

return scalar b1 = _b[lavgrexp]
return scalar b2 = _b[lunch]
return scalar b3 = _b[lenroll]

predict x1b1hat, xb
gen scale=normden(x1b1hat)
gen pe1=scale*_b[lavgrexp]
summarize pe1
return scalar ape1=r(mean)
gen pe2=scale*_b[lunch]
summarize pe2
return scalar ape2=r(mean)
gen pe3=scale*_b[lenroll]
summarize pe3
return scalar ape3=r(mean)

drop x1b1hat scale pe1 pe2 pe3
end


*Bootstrapped SE within districts
bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot

program drop math4_boot

I had a few issues with it, first, the normden function has been depreciated so I replaced it with normalden.

Then I got this error message:

Code:

Bootstrap replications (20)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
repeated time values within panel
the most likely cause for this error is misspecifying the cluster(), idcluster(), or group() option
r(451);

So I added an "xtset, clear" statement before the bootstrap statement, and "xtset distid year" before the estimations of the fractional regression model.

Here is my full code at this point:

Code:

capture program drop math4_boot

program math4_boot, rclass

xtset distid year

glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
mat b = e(b)
xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip)


return scalar b1 = _b[lavgrexp]
return scalar b2 = _b[lunch]
return scalar b3 = _b[lenroll]


predict x1b1hat, xb
gen scale=normalden(x1b1hat)
gen pe1=scale*_b[lavgrexp]
summarize pe1
return scalar ape1=r(mean)
gen pe2=scale*_b[lunch]
summarize pe2
return scalar ape2=r(mean)
gen pe3=scale*_b[lenroll]
summarize pe3
return scalar ape3=r(mean)

drop x1b1hat scale pe1 pe2 pe3
end



*Bootstrapped SE within districts
xtset, clear
bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot

program drop math4_boot

But I'm still having an error message:

Code:

Bootstrap replications (20)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxx
insufficient observations to compute bootstrap standard errors
no results will be saved

I tried adding the "nodrop" option in the boostrap statement but it did not change anything.

Any idea what could cause this issue please?

Last edited by Yoann Morin; 10 Jul 2019, 12:01.

Tags: bootstrap, fractional variable, regression

Yoann Morin

Join Date: Jan 2019
Posts: 10

11 Jul 2019, 07:11

I did not undestood everything that was going on but I managed to get the program to work by ignoring the time dimension in the xtset statement. Also the variable to identify the individual dimension is the ID created in the bootstrap statement as opposed to the original ID as I specified in the previous version.

Here is the functional code:

Code:

capture program drop math4_boot

program math4_boot, rclass

xtset newid

glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
mat b = e(b)
xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip) 


return scalar b1 = _b[lavgrexp]
return scalar b2 = _b[lunch]
return scalar b3 = _b[lenroll]


predict x1b1hat, xb
gen scale=normalden(x1b1hat)

gen pe1=scale*_b[lavgrexp]
summarize pe1
return scalar ape1=r(mean)

gen pe2=scale*_b[lunch]
summarize pe2
return scalar ape2=r(mean)

gen pe3=scale*_b[lenroll]
summarize pe3
return scalar ape3=r(mean)

drop x1b1hat scale pe1 pe2 pe3
end



*Bootstrapped SE within districts
bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot

program drop math4_boot

I still have a minor issue, despite the seed being set to the same value as in their paper, the boostraped standard errors seems to be slighly different.

Here is the comparison for each variable and its Average partial Effect (APE):

In the paper

Reproduction

Absolute value of the difference

log(arexppp)

0.206

0.2059

1E-04

log(arexppp) APE

0.070

0.0691

0.0009

lunch

0.209

0.2148

0.0058

lunch APE

0.067

0.0729

0.0059

log(enroll)

0.139

0.1412

0.0022

log(enroll) APE

0.045

0.0477

0.0027

Do you know why is there such differences? Could it be explained by the different stata version used? (this was estimated on stata 13, don't know which version the author used but the paper is from 2008).

Comment

高佳

Join Date: Jan 2016
Posts: 80

04 Feb 2021, 19:47

Originally posted by Yoann Morin View Post

Code:

capture program drop math4_boot

program math4_boot, rclass

xtset newid

glm math4 lavgrexp alavgrexp lunch alunch lenroll alenroll y96-y01 if year>1994, fa(bin) link(probit) cluster(distid)
mat b = e(b)
xtgee math4 lavgrexp lunch lenroll alavgrexp alunch alenroll y96-y01, fa(bi) link(probit) corr(exch) robust from(b,skip)


return scalar b1 = _b[lavgrexp]
return scalar b2 = _b[lunch]
return scalar b3 = _b[lenroll]


predict x1b1hat, xb
gen scale=normalden(x1b1hat)

gen pe1=scale*_b[lavgrexp]
summarize pe1
return scalar ape1=r(mean)

gen pe2=scale*_b[lunch]
summarize pe2
return scalar ape2=r(mean)

gen pe3=scale*_b[lenroll]
summarize pe3
return scalar ape3=r(mean)

drop x1b1hat scale pe1 pe2 pe3
end



*Bootstrapped SE within districts
bootstrap r(b1) r(b2) r(b3) r(ape1) r(ape2) r(ape3), reps(20) seed(123) cluster(distid) idcluster(newid): math4_boot

program drop math4_boot

In the paper

Reproduction

Absolute value of the difference

log(arexppp)

0.206

0.2059

1E-04

log(arexppp) APE

0.070

0.0691

0.0009

lunch

0.209

0.2148

0.0058

lunch APE

0.067

0.0729

0.0059

log(enroll)

0.139

0.1412

0.0022

log(enroll) APE

0.045

0.0477

0.0027

Hi Morin,

Have you succeeded in replicating this paper? I'm trying to do so, would like to hear from you and share with you.

Announcement

Reproduction from Papke and Wooldrige (2008): issues with bootstrap (insufficient observations)

Comment

Comment