Bootstrap vs Robust S.E.

Cooper Felix

Join Date: Sep 2015

Posts: 84
#1

Bootstrap vs Robust S.E.

04 Oct 2018, 23:38

The manual from Stata does not provide sufficient details towards the differences between bootstrap and robust standard errors. Can anyone provide any suggestions on when I should adopt bootstrap SE?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

05 Oct 2018, 00:54

Cooper:
a practical approach is to use bootstrap SE whenever robust/cluster options are not available (see for instance -xtreg, be-).
A comprtehensive coverage of bootstrap methods in microeconometrics with Stata is provide in Cahpater 13 of the valuable https://www.stata.com/bookstore/micr...metrics-stata/.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35691
#3

05 Oct 2018, 03:07

The term "robust" although conventional in econometrics is oversold from some other points of view. The robustness concerned is limited, if I understand it correctly, to corrections for heteroscedasticity and autocorrelation. Bootstrapping can be used in a great variety of situations. The Stata philosophy here is more or less that you are expected to know which is pertinent., which could require reading and/or coursework.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

05 Oct 2018, 03:21

Nick's wise point reminds me of this paramount textbook on bootstrap https://www.crcpress.com/An-Introduc.../9780412042317.

Kind regards,
Carlo
(Stata 19.0)
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#5

05 Oct 2018, 16:05

Originally posted by Carlo Lazzaro View Post

Cooper:
a practical approach is to use bootstrap SE whenever robust/cluster options are not available (see for instance -xtreg, be-).
A comprtehensive coverage of bootstrap methods in microeconometrics with Stata is provide in Cahpater 13 of the valuable https://www.stata.com/bookstore/micr...metrics-stata/.

Thanks for your response. I wonder if both "robust" and "bootstrap" are available, can I still use the latter?
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#6

05 Oct 2018, 16:08

Originally posted by Nick Cox View Post

The term "robust" although conventional in econometrics is oversold from some other points of view. The robustness concerned is limited, if I understand it correctly, to corrections for heteroscedasticity and autocorrelation. Bootstrapping can be used in a great variety of situations. The Stata philosophy here is more or less that you are expected to know which is pertinent., which could require reading and/or coursework.

Yes indeed, so in your opinion, there is no need to specify "robust" when estimating a panel fixed effects model? I conducted a test of heteroskedasticity with an user-written command "xttest3" and results suggest that heteroskedasticity is a concern, so I guess I have to find a way to deal with this issue. Any recommendations?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#7

06 Oct 2018, 00:51

Cooper:
as Nick wisely pointed out, whenever we invoke robustness we should be aware of the setting we are concerned with.
From your reply, I gather that you're dealing with panel data regression:
- in general, if -robust- option is available, I would use it; otherwise, I woud go -bootstrap-;
- if you've a N>T panel dataset, both robust and cluster options take heteroskedastciti and /or autocorrelation into account. Usually, in N>T panel dataset heteroskedasticity can byte harder than autocorrelation (the oposite, in general, holds for T>N panel dataset). As per your last reply, if you used the user-written command -xttest3- you performed -xtreg, fe- or -xtgls- regressions. If you have a T>N panel dataset (that is suited for -xtgls-), you can model both heteroskedasticity and autocorrelation explicitly.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#8

21 Nov 2021, 00:25

Hi folks,

I wonder if there are any papers that could support the idea that, in a panel data setting, using robust S.E. is better than using a bootstrap S.E.? I google-scholared but couldn't find any. Thanks.
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#9

21 Nov 2021, 02:05

To #8: Which is better highly depends on to what extent related assumptions hold. For example, in a typical -xtreg, fe- or -xtreg, re- setting, robust SE is equivalent to clustered SE at the level of panel id. It requires a large number of clusters when the asymptotic formula of SE can be used and assumes independence between clusters -- When (you believe) these assumptions are satisfied, robust SE would be a better option. Bootstrap may be preferred when some assumptions fail. For example, when the number of clusters is small, a variety of bootstrap methods have been developed to handle statistical inference. Overall, the answer is not black and white but rather complicated. Literature would suggest a way of inference for a very specific situation, not as general as "in a panel data setting".
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#10

21 Nov 2021, 02:58

Cooper:
-xtnbreg-, for instance has ni option for clustered-robust standard errors (SEs), but for -bootstrap- SEs.
When an option is not available in Stata, there are sound statistical reasons.

Kind regards,
Carlo
(Stata 19.0)
Comment
Cooper Felix

Join Date: Sep 2015

Posts: 84
#11

21 Nov 2021, 03:27

Originally posted by Fei Wang View Post

To #8: Which is better highly depends on to what extent related assumptions hold. For example, in a typical -xtreg, fe- or -xtreg, re- setting, robust SE is equivalent to clustered SE at the level of panel id. It requires a large number of clusters when the asymptotic formula of SE can be used and assumes independence between clusters -- When (you believe) these assumptions are satisfied, robust SE would be a better option. Bootstrap may be preferred when some assumptions fail. For example, when the number of clusters is small, a variety of bootstrap methods have been developed to handle statistical inference. Overall, the answer is not black and white but rather complicated. Literature would suggest a way of inference for a very specific situation, not as general as "in a panel data setting".

Thanks, when I applied xtreg, fe vce(boot, r(500) seed(#)), I realized that the significance level of a focal variable's coefficient varies from p<0.1 to p<0.05 depending on the seed # I adopted. Does this mean one can potentially "manipulate" the significance level using bootstrap S.E. while xtreg, fe vce(robust) allows more consistent results? Any thoughts?
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#12

21 Nov 2021, 03:31

Originally posted by Cooper Felix View Post

Thanks, when I applied xtreg, fe vce(boot, r(500) seed(#)), I realized that the significance level of a focal variable's coefficient varies from p<0.1 to p<0.05 depending on the seed # I adopted. Does this mean one can potentially "manipulate" the significance level using bootstrap S.E. while xtreg, fe vce(robust) allows more consistent results? Any thoughts?

I wouldn't conclude that. If you allow a larger number in -r()-, say beyond 1,000 times, p-values would be fairly stable.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#13

21 Nov 2021, 04:15

Cooper:
Fei gave wise advice.
As an aside:
1) in their https://www.routledge.com/An-Introdu.../9780412042317, Authors recommended 25- 200 bootstrap replications for SE estimate (page 47). Usually, 200 is the number to consider.
2) the clustered-robust standard error simply does not use a resampling method. As such, when applied to the very same dataset and code will give back the very same results every time you invoke it.

Kind regards,
Carlo
(Stata 19.0)
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 691
#14

21 Nov 2021, 08:34

Originally posted by Carlo Lazzaro View Post

Cooper:
Fei gave wise advice.
As an aside:
1) in their https://www.routledge.com/An-Introdu.../9780412042317, Authors recommended 25- 200 bootstrap replications for SE estimate (page 47). Usually, 200 is the number to consider.
2) the clustered-robust standard error simply does not use a resampling method. As such, when applied to the very same dataset and code will give back the very same results every time you invoke it.

This source is very dated (although a great read otherwise, I really recommend it). Newer references say 15,000 is the lower acceptable limit (see https://arxiv.org/abs/1411.5279). Especially when p-values are volatile (how big is the bias?), much more replications than 500 are probably necessary. If even then the p-values do not stabilize, there are probably bigger problems with the data or very strange distributions are present.

Last edited by Felix Bittmann; 21 Nov 2021, 08:37.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#15

21 Nov 2021, 09:58

Felix:
yes, it's true that this pivotal reference is really dated (1993) and at those days computers were less powerful (and their availability was not as wide as today, at least when I graduated, that is well in the past millennium).
-bootstrap- entry in Stata .pdf manual reports 100 replications for SE estimate (Example 2), that may be not enough to give back stable results in most of the research projects
With an average powerful laptop, today 200 -bootstrap- replications should be considered the lower limit of the range, whereas the upper one depends on other considerations (bootstrap bias; p-values volatility; research field traditions)-

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Bootstrap vs Robust S.E.

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment