Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question bootstrap - saved dataset yield different values compared to bootstrap

    Dear all,

    A (probably silly) question about the bootstrap command.
    When trying to bootstrap a coefficient, one can save the estimated coefficient for each resampling/bootstrap iteration, and pile them up in a dataset.
    When summarizing the aforementioned coefficient , I get an average that is different from the bootstrap results (while the sd are exactly identical.)
    Below is an example that hopefully will illustrate/clarify:

    Code:
    cap program drop Test
    program define Test, rclass
        preserve
        reg price mpg weight length
        clear
        return scalar coeff = _b[mpg]/_b[weight]
        end
    
    sysuse auto, clear
    bootstrap r(coeff), reps(50) seed(123) saving(bla, replace): Test 
    use bla, clear
    su _bs_1
    Could you please help me understand why the su _bs_1 yields a different value compared to the results returned by the bootstrap ?

    Thanks a lot,
    Alex.

  • #2
    Though I wouldn't have guessed this, the reported coefficient seems to be based on a regression model using the original data:

    Code:
    sysuse auto, clear
    reg price mpg weight length
    di _b[mpg]/_b[weight]   // <- this is what bootstrap returns
    
    bootstrap r(coeff), reps(50) seed(123) saving(bla, replace): Test
    use bla, clear
    su _bs_1
    Typically they would not be so different, but here _bs_1 is highly skewed, so the mean and median are both quite different from the 'observed' result. hth, Jeph

    Comment


    • #3
      The introduction to the Remarks and Examples section of the bootstrap documentation in the PDF linked to from the top of the output of help bootstrap begins by telling us

      With few assumptions, bootstrapping provides a way of estimating standard errors and other measures of statistical precision ...
      and that is what it has done for the measure coef returned by your program. This is why the output of your bootstrap command
      Code:
      Bootstrap results                               Number of obs     =         74
                                                      Replications      =         50
      
            command:  Test
              _bs_1:  r(coeff)
      
      ------------------------------------------------------------------------------
                   |   Observed   Bootstrap                         Normal-based
                   |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
             _bs_1 |  -19.88392   57.85066    -0.34   0.731    -133.2691    93.50128
      ------------------------------------------------------------------------------
      describes the coefficient estimate as "Observed" while it describes the standard error as "Bootstrap".
      Last edited by William Lisowski; 11 Jan 2021, 11:13.

      Comment


      • #4
        Indeed, this also refers back to the theory of the bootstrap. You always want to utilize the point estimate of your original sample because this is the best point estimate and bootstrapping will never improve it. Finally, when the number of bootstrap samples is large enough, the mean of the bootstrapped statistics will usually converge to the point estimate and the bias approaches zero. Here you have a kind of pathological example with badly skewed data. However, this only adds noise and this mean does not help you. You simple want the bootstrapping to quantify the variance around the point estimate.
        Best wishes

        Stata 18.0 MP | ORCID | Google Scholar

        Comment


        • #5
          Thanks all, makes sense !

          Comment

          Working...
          X