Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Retaining Individual Iterations of Bootstrap Estimates

    Dear Stata Users:

    I'd like to retain individual parameter (B) and standard error (se) estimates from a bootstrap procedure. That is, sample with replacement N times, run a regression on each of the N resamples, and save one of the B's (and the associated se) in a dataset. In each iteration, I save the desired B and se in a 2 x 1 matrix. then I try to concatenate the matrices. The code is very cumbersome, and am hoping this problem can be simplified somehow.

    Here's the code:

    Code:
    use data.dta sum
    scalar n = r(N)
    scalar N = 10000
    forvalues i=1/N {
    preserve bsample n /*sample w/ replacement*/ display "" display "" display `i' /*display iteration number*/ reg y x1 x2 /*run regression*/ mat A = r(table) mat A`i' = (A[1,2], A[2,2]) /*save B and se for X2*/ restore }
    *Concatenate A matrices mat B1 = A1 forvalues i=2/10000 {
    local k = `i'-1 mat B`i' = (B`k' \ A`i') }
    Complicating the problem is that I've got the cheap "flavor" of Stata at work, which only allows for matrices with 800 rows.

    Is there no easier way to do this? I looked into the "bootstrap" command, and it looks like it gives me summary statistics, but it doesn't look possible to save the estimates from each iteration.

    Thanks!
    David
    Last edited by David Crow; 16 Sep 2016, 20:13.
    Web site:
    ​http://investigadores.cide.edu/crow/


    Las Américas y el Mundo:
    http://lasamericasyelmundo.cide.edu/

    ==========================================
    David Crow
    Associate Professor, División de Estudios Internacionales
    Centro de Investigación y Docencia Económicas (CIDE)
    ==========================================

  • #2
    I looked into the "bootstrap" command, and it looks like it gives me summary statistics, but it doesn't look possible to save the estimates from each iteration.
    Look again. That's precisely what the -saving()- option does.

    Comment


    • #3
      Thanks for the reply. I apologize: my mistake.

      Just for kicks and giggles, what would be a better way to make my code work--i.e., to save and concatenate many matrices saved in a loop?
      Web site:
      ​http://investigadores.cide.edu/crow/


      Las Américas y el Mundo:
      http://lasamericasyelmundo.cide.edu/

      ==========================================
      David Crow
      Associate Professor, División de Estudios Internacionales
      Centro de Investigación y Docencia Económicas (CIDE)
      ==========================================

      Comment


      • #4
        Just for kicks and giggles, what would be a better way to make my code work--i.e., to save and concatenate many matrices saved in a loop?
        What you have is not far off. It could be trimmed down a little bit. You can accumulate the result matrix on the fly in the first loop; you don't have to save 10,000 matrices and then run a second loop to concatenate them. You don't have to pull the results from r(table): the _b[] and _se[] virtual matrices are simpler to deal with. And, for reproducible results, you should set the random number generator seed before you start sampling.

        Code:
        use data.dta, clear
        local nreps 10000
        set seed 1234 // OR YOUR LUCKY NUMBER
        forvalues i=1/`nreps' {
            preserve
            bsample  /*sample w/ replacement--default sample size is _N*/
            display _newline(2) `i' /*display iteration number*/
            reg y x1 x2 /*run regression*/
            mat this_run = (_b[x2], _se[x2])
            mat cumulative = nullmat(cumulative) \ this_run
            restore
        }
        Now, personally, I'm not a fan of accumulating results in a matrix unless I plan to actually do some matrix calculations with them later. In this context, it's more likely that I'll want to get some statistics describing the distributions of b and se, so having them in a data file would be more convenient. You can, of course, convert a matrix into a data file using the -svmat- command, but I'd be more likely to build the new data file with the -postfile- commands (which you can read about in the [P] manual if you're interested).

        That's actually what -bootstrap, saving()- does behind the scenes. Compared to "rolling my own," using -bootstrap- is quicker to code and it eliminates the risk that I'll code the loop incorrectly. Of course, using -bootstrap- entails some overhead, as it has to parse its input, and it does certain data validity checks, and has other code needed to cope with exceptional situations that perhaps you can guarantee won't come up n your calculations. But the overhead isn't really very much. So, unless I were working with a huge data set and bootstrapping something computationally intensive, whereby shaving a millisecond of run time here and there each time through the loop would add up to something noticeable, I'd just use -bootstrap-.

        Comment


        • #5
          Again, many thanks. I guess I wanted to get the long version of the loop right because I can imagine situations might arise in which bootstrap can't be used (though I can't think of any right now--it looks like bootstrap can recover really almost any element of the estimation output).

          Another thing is that -bootstrap- appears to sample only from complete cases (always giving the same sample size), whereas using -bsample- gives different sample sizes with each iteration. Sampling with replacement might draw an observation with missing values two or more times, leading to different numbers of cases used in each iteration of the analysis. This would, presumably, make parameter estimates slightly more variable than with -bootstrap-.
          Web site:
          ​http://investigadores.cide.edu/crow/


          Las Américas y el Mundo:
          http://lasamericasyelmundo.cide.edu/

          ==========================================
          David Crow
          Associate Professor, División de Estudios Internacionales
          Centro de Investigación y Docencia Económicas (CIDE)
          ==========================================

          Comment

          Working...
          X