Question bootstrap - saved dataset yield different values compared to bootstrap

Desbuquois Alexandre

Join Date: Apr 2016

Posts: 4
#1

Question bootstrap - saved dataset yield different values compared to bootstrap

11 Jan 2021, 09:28

Dear all,

A (probably silly) question about the bootstrap command.
When trying to bootstrap a coefficient, one can save the estimated coefficient for each resampling/bootstrap iteration, and pile them up in a dataset.
When summarizing the aforementioned coefficient , I get an average that is different from the bootstrap results (while the sd are exactly identical.)
Below is an example that hopefully will illustrate/clarify:

Code:

cap program drop Test program define Test, rclass preserve reg price mpg weight length clear return scalar coeff = _b[mpg]/_b[weight] end sysuse auto, clear bootstrap r(coeff), reps(50) seed(123) saving(bla, replace): Test use bla, clear su _bs_1

Could you please help me understand why the su _bs_1 yields a different value compared to the results returned by the bootstrap ?

Thanks a lot,
Alex.
Tags: None
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#2

11 Jan 2021, 10:54

Though I wouldn't have guessed this, the reported coefficient seems to be based on a regression model using the original data:

Code:

sysuse auto, clear reg price mpg weight length di _b[mpg]/_b[weight] // <- this is what bootstrap returns bootstrap r(coeff), reps(50) seed(123) saving(bla, replace): Test use bla, clear su _bs_1

Typically they would not be so different, but here _bs_1 is highly skewed, so the mean and median are both quite different from the 'observed' result. hth, Jeph
1 like
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

11 Jan 2021, 11:09

The introduction to the Remarks and Examples section of the bootstrap documentation in the PDF linked to from the top of the output of help bootstrap begins by telling us

With few assumptions, bootstrapping provides a way of estimating standard errors and other measures of statistical precision ...

and that is what it has done for the measure coef returned by your program. This is why the output of your bootstrap command

Code:

Bootstrap results                               Number of obs     =         74
                                                Replications      =         50

      command:  Test
        _bs_1:  r(coeff)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |  -19.88392   57.85066    -0.34   0.731    -133.2691    93.50128
------------------------------------------------------------------------------

describes the coefficient estimate as "Observed" while it describes the standard error as "Bootstrap".

Last edited by William Lisowski; 11 Jan 2021, 11:13.

Comment

Felix Bittmann

Join Date: Aug 2018

Posts: 714
#4

11 Jan 2021, 11:17

Indeed, this also refers back to the theory of the bootstrap. You always want to utilize the point estimate of your original sample because this is the best point estimate and bootstrapping will never improve it. Finally, when the number of bootstrap samples is large enough, the mean of the bootstrapped statistics will usually converge to the point estimate and the bias approaches zero. Here you have a kind of pathological example with badly skewed data. However, this only adds noise and this mean does not help you. You simple want the bootstrapping to quantify the variance around the point estimate.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
2 likes
Comment
Desbuquois Alexandre

Join Date: Apr 2016

Posts: 4
#5

11 Jan 2021, 12:04

Thanks all, makes sense !
Comment

Announcement