Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping r(mean) - Always Getting the Same Result

    I think there is something I don't understand about how -bootstrap- works.

    I have a dataset with 200 observations; one of the variables is height. There are no missing values.

    The mean height is 60.5.

    When I run

    Code:
    bootstrap meanHeight=r(mean), size(10) reps(10) noisily: summarize height
    The Observed Coef. of the output is always 60.5.

    Since I have used the noisily option, the 10 random draws of 10 observations and the mean height for each draw (repetition) are displayed on the screen. If I use the "saving" option, the resulting file contains the mean produced by each repetition.

    How do I get Stata to display the mean of those means? Do you have to open the saved file and take the mean of means?

    Can't Stata display that mean in the output of -bootstrap-? I thought that was the whole point.

    Thank you.

  • #2
    How do I get Stata to display the mean of those means? Do you have to open the saved file and take the mean of means?
    If that is what you want to do, that is how you have to do it.

    Can't Stata display that mean in the output of -bootstrap-? I thought that was the whole point.
    Actually, no. Not only is that not the whole point, it isn't even appropriate to do. Bootstrapping's purpose is to give an estimate of the standard error when the usual normal theory approximations are not applicable or no closed-form estimator exists. But bootstrapping does not improve your estimation of the mean. The overall sample mean is actually a less biased estimate of the population mean than the mean of the means in the bootstrap samples. The -bootstrap- section of the online manual explains this in detail. You'l find it in the Introduction section of the -bootstrap- chapter of [R].
    Last edited by Clyde Schechter; 05 Mar 2017, 15:43. Reason: Correct typos. Reduce verbosity.

    Comment


    • #3
      Thank you for that explanation. I wasn't suggesting that the resulting mean of means is more accurate. This is mostly an exercise.

      Is there a way to obtain the mean of the randomly drawn means?

      Do I need a loop for that?

      Should the mean of each draw be stored in a macro, tempfile, array, etc. and then aggregated?

      Thanks again.

      Comment


      • #4
        So, run your -bootstrap- with the -saving()- option. Whether you use a -tempfile- or a regular file is up to you. If you will have additional use for the individual bootstrap results after your current do-file runs, then a regular file makes sense. If they are of no further use, then a -tempfile- will keep you from cluttering up your file system. No loops, nor anything else complicated needed.

        Code:
        tempfile bootstrap_output
        bootstrap meanHeight=r(mean), saving(`bootstrap_output') size(10) reps(10) noisily: summarize height
        use `bootstrap_output'
        summarize meanHeight
        The other odd thing, by the way, is doing bootstrap samples of size 10 when you have 200 observations in your data. This is perfectly legal, but inefficient. Usually the bootstrap sample size is taken to be the size of the entire sample. Do you have a specific reason for this?

        Comment


        • #5
          Thank you, Clyde. That worked. There isn't really a specific reason to take a sample size of 10 out of 200 observations. I was mostly trying to get the technique down. I chose 10 just for the purpose of exercising this since it's a small enough number that I could easily check.

          A followup question has to do with bootstrapping linear regression coefficients and plotting the resulting regression lines all on the same scatter plot. Is that doable?

          Should I use bootstrap : regress of regress...vce?

          Comment


          • #6
            Use -regress, vce(bootstrap)- to bootstrap linear regression coefficients.

            Comment


            • #7
              Ok, I got that to run. Can I superimpose the regression line that results from each sample on one scatter plot?

              Comment


              • #8
                Not easily. Again, this is not what bootstrapping is for, but I understand you are just doing this as an exercise. So you can specify a -saving()- suboption in the -vce(bootstrap)- option that will lead to all of the regression coefficients being saved in a temporary or permanent data file, one observation for each regression. Then you can load that file into memory and generate the graphs you want that way looping over the reps and using -graph twoway function- You probably also want to specify a -reps()- suboption in your -vce(bootstrap)- option as well, because the default is 50 reps and if you try to overlay 50 regression lines on a single graph you will probably just have a mess.

                Comment


                • #9
                  Thank you. I will give that a try.

                  Comment


                  • #10
                    For context, this is what I am trying to accomplish (without the animation):

                    https://www.stat.auckland.ac.nz/~wil...otstrap4-1.mp4

                    https://www.stat.auckland.ac.nz/~wil...otstrap5-1.mp4

                    I am trying to end up with the graph on the bottom right, again, without the animation. Just a static graph that overlays the regression lines.

                    I think it can be done in R fairly easily, but I am trying to do it Stata...



                    Comment


                    • #11
                      OK. I don't have your data set, so I'll show you with the built-in auto.dta data set.

                      Code:
                      set more off
                      clear*
                      sysuse auto
                      
                      summ mpg
                      local mpg_lo = r(min)
                      local mpg_hi = r(max)
                      
                      regress price mpg
                      local b_cons = _b[_cons]
                      local b_mpg = _b[mpg]
                      
                      local nreps 10
                      
                      tempfile bootstrap_results
                      regress price mpg, vce(bootstrap, saving(`bootstrap_results') reps(`nreps'))
                      
                      use `bootstrap_results', clear
                      
                      local graph_cmd
                      forvalues i = 1/`nreps' {
                          local graph_cmd `graph_cmd' || function y = _b_cons[`i'] + _b_mpg[`i']*x, range(`mpg_lo' `mpg_hi') 
                      }
                      
                      display `"`graph_cmd'"'
                      
                      graph twoway function y = `b_cons' + `b_mpg'*x, ///
                          range(`mpg_lo' `mpg_hi') legend(off) lwidth(thick) ///
                          `graph_cmd'
                      So the logic is that you first do the sample regression and store the coefficients, as well as the range of the independent variable in local macros. Then do the regression with bootstrap vce and store the coefficients. Load in the coefficients and build up a long chain of -graph twoway function- commands, one for each bootstrap replication. Then run the graph twoway function command, with the long chain included. I distinguished the overall sample regression by making the line thicker, rather than setting the colors. You can use the various -graph twoway- options to modify that or other aspects of the graph. But this is the gist of it.

                      Comment


                      • #12
                        You can start with something like the following. It uses the second link's dataset and creates a graph of the slopes looking like the one at the end of the run. I didn't bother, but you can use graph twoway's text box option to write in the numbers shown there. There probably are shortcuts to a lot of what I show below, and others can point you to them.

                        Code:
                        version 14.2
                        
                        clear *
                        set more off
                        set seed 1376973
                        
                        input double Growth int Volume
                        0.36  22
                        0.09   6
                        0.67  93
                        0.44  62
                        0.72  84
                        0.24  14
                        0.33  52
                        0.61  69
                        0.66 104
                        0.8  100
                        end
                        
                        // Get bootstrap slopes
                        tempfile tmpfil0
                        bootstrap y1 = _b[Volume], saving(`tmpfil0') reps(30) level(90) nodots: regress Growth c.Volume
                        estat bootstrap, percentile
                        
                        // Get slope
                        tempname B
                        matrix define `B' = e(b)
                        drop _all
                        svmat double `B', name(y)
                        generate byte bs = 0
                        
                        // Put slopes together
                        append using `tmpfil0'
                        quietly replace bs = missing(bs)
                        sort bs y1 // Sorting for 90% confidence bounds
                        quietly replace bs = sum(bs)
                        
                        // Create the two points foreach slope
                        generate byte x1 = 1
                        generate byte x0 = 0
                        generate double y0 = 0
                        quietly reshape long x y, i(bs) j(seq)
                        
                        // Graph them
                        local command graph twoway
                        summarize bs, meanonly
                        forvalues i = 1/`=r(max)' {
                            local color = cond(inlist(`i', 2, 29), "blue", "red")
                            local command `command' line y x if bs == `i', lcolor(`color') ||
                        }
                        local command `command' line y x if !bs, lcolor(blue)
                        
                        `command' ytitle("") xtitle("") ylabel( , angle(horizontal) nogrid) legend(off)
                        
                        exit

                        Comment


                        • #13
                          Originally posted by Clyde Schechter View Post

                          So the logic is that you first do the sample regression and store the coefficients, as well as the range of the independent variable in local macros. Then do the regression with bootstrap vce and store the coefficients. Load in the coefficients and build up a long chain of -graph twoway function- commands, one for each bootstrap replication. Then run the graph twoway function command, with the long chain included. I distinguished the overall sample regression by making the line thicker, rather than setting the colors. You can use the various -graph twoway- options to modify that or other aspects of the graph. But this is the gist of it.
                          This is brilliant, Clyde. Thank you very much for taking the time to craft that. It works well, and I have learned a lot!

                          Joseph, thank you as well. Your example is harder for me to grasp but I will try to work through it too.

                          Comment

                          Working...
                          X