Bootstrap vs Robust S.E.

Cooper Felix

Join Date: Sep 2015

Posts: 84
#16

21 Nov 2021, 12:35

Originally posted by Felix Bittmann View Post

This source is very dated (although a great read otherwise, I really recommend it). Newer references say 15,000 is the lower acceptable limit (see https://arxiv.org/abs/1411.5279). Especially when p-values are volatile (how big is the bias?), much more replications than 500 are probably necessary. If even then the p-values do not stabilize, there are probably bigger problems with the data or very strange distributions are present.

Originally posted by Felix Bittmann View Post

This source is very dated (although a great read otherwise, I really recommend it). Newer references say 15,000 is the lower acceptable limit (see https://arxiv.org/abs/1411.5279). Especially when p-values are volatile (how big is the bias?), much more replications than 500 are probably necessary. If even then the p-values do not stabilize, there are probably bigger problems with the data or very strange distributions are present.

Thanks Carlo and Felix, I read the book by Tibshirani before and also noticed they suggested at least 50 replications for S.E. purposes which is the default option of vce(boot). However, I found that even with 200 replications, the significance level in fact varies quite a bit, jumping from .1 to .005 to .1 again depending on the seeding number. If using more than 1000 replications as suggested by Fei to obtain a "stable" p-value, this will unfortunately slow down the whole estimation process by a lot making it less practical. Is there any way to speed up the bootstrapping process in Stata with xtreg, fe?
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#17

21 Nov 2021, 12:36

Originally posted by Felix Bittmann View Post

This source is very dated (although a great read otherwise, I really recommend it). Newer references say 15,000 is the lower acceptable limit (see https://arxiv.org/abs/1411.5279). Especially when p-values are volatile (how big is the bias?), much more replications than 500 are probably necessary. If even then the p-values do not stabilize, there are probably bigger problems with the data or very strange distributions are present.

Originally posted by Carlo Lazzaro View Post

Felix:
yes, it's true that this pivotal reference is really dated (1993) and at those days computers were less powerful (and their availability was not as wide as today, at least when I graduated, that is well in the past millennium).
-bootstrap- entry in Stata .pdf manual reports 100 replications for SE estimate (Example 2), that may be not enough to give back stable results in most of the research projects
With an average powerful laptop, today 200 -bootstrap- replications should be considered the lower limit of the range, whereas the upper one depends on other considerations (bootstrap bias; p-values volatility; research field traditions)-

Thanks Carlo and Felix, I read the book by Tibshirani before and also noticed they suggested at least 50 replications for S.E. purposes which is the default option of vce(boot). However, I found that even with 200 replications, the significance level in fact varies quite a bit, jumping from .1 to .005 to .1 again depending on the seeding number. If using more than 1000 replications as suggested by Fei to obtain a "stable" p-value, this will unfortunately slow down the whole estimation process by a lot making it less practical. Is there any way to speed up the bootstrapping process in Stata with xtreg, fe?
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 691
#18

22 Nov 2021, 00:00

https://journals.sagepub.com/doi/abs...36867X19874242

https://github.com/gvegayon/parallel

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#19

22 Nov 2021, 05:54

Bootstrapping isn't a panacea to cure all woes. Even Efron, it's inventor, wrote that closed-form/analytic solutions should be used when known to derive standard errors. I would also echo the sound advice here and suggest that it be used as a last resort, when conventional robust estimators don't exist. If you are in the unfortunate situation to use them, it is probably better to either do some simulations to see whether the standard estimate is problematic, or else rerun the analyses with increasing number of resamples to convince yourself that the estimates have converged.
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#20

22 Nov 2021, 18:26

In my understanding (which I could be wrong about, I'm still going through my coursework myself), bootstrapping SEs can be useful when you have smaller sample sizes, too.

Suppose you're using regression and synthetic controls together, and you're limiting your analysis to your donor pool (typically under 100 units)... regular standard errors may give inconsistent estimates due to small sample sizes. So, we bootstrap 2000 times, which if I recall is simply repeatedly estimating the SEs from a normal distribution
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#21

23 Nov 2021, 01:13

Jared:
-bootstrap- is usually a non-parametric resampling (with reintroduction) procedure.
Hence, data speak for themselves.
You can impose parametric -bootstrap-procedure, assuming that data are resamples (with reintroduction) from a given theoretical probability distribution (e.g., a Normal one), though.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment