Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootstrapping questions

    Hi guys,

    I'm using Stata 13.1 and lets use the example datasheet "auto" for my two questions. I have not done bootstrapping before, but read the bootstrapping chapter in "Microeconometrics using Stata".

    1) What is the difference between:
    regress mpg weight gear foreign, vce(bootstrap, reps(100) seed(1))
    bootstrap, reps(100) seed(1): regress mpg weight gear foreign

    It gives me the same result (which is not surprising, please see attached) - but is "methodology" behind the code the same? Thats my only concern.
    Click image for larger version

Name:	VCE.png
Views:	2
Size:	204.3 KB
ID:	1305637

    2) Isnt it possible to save the "bootstrapped" dataset of, for example, 2.000 reps, i.e. the simulated data?
    I would really like this, because I find it easier to do hypothesis testing, etc. if I have the "new" dataset.

    3) Same as Q2, just with the residuals bootstrap approach:
    With help from the Microeconometrics book, mentioned above, I use the following code:

    use auto, clear
    quietly regress mpg trunk price
    predict uhat, resid
    keep uhat
    save residuals, replace
    program bootresidual
    version 11
    drop _all
    use residuals
    bsample
    merge using auto.dta
    regress mpg trunk price
    predict xb
    generate ystar=xb+uhat
    regress ystar trunk price
    end

    **
    simulate _b, seed(1) reps (400) nodots: bootresidual
    sum

    But as for Q2, I would really like a "new" bootstrapped dataset - is it possible? And when would you prefer 1) > 2)
    Attached Files

  • #2
    Thomas:
    as far as your questions 2 and 3 are concerned, perhaps what you're looking for is the -saving- option in -bootstrap- command.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      @Carlo: Thanks. And you are probably right - can you help me how to implement it? For example, if I want to save the "bootstrapped" data in a new file?

      Comment


      • #4
        As of 1), the two commands execute exactly the same procedure. If you run this in Stata 13.1:

        Code:
        sysuse auto, clear
        set trace on 
        set tracedepth 2
        regress mpg weight gear foreign, vce(bootstrap, reps(100) seed(1))
        You'll see that the command called to obtain the bootstrapped standard errors is:

        Code:
        version 13.1: bootstrap , reps(100) seed(1)      : regress mpg weight gear_ratio foreign
        which is the second command.

        Jorge Eduardo Pérez Pérez
        www.jorgeperezperez.com

        Comment


        • #5
          Thomas:
          elaborating a bit on one of your codes:
          Code:
          . use auto.dta, clear
          (1978 Automobile Data)
          
          . regress mpg weight gear foreign
          
                Source |       SS       df       MS              Number of obs =      74
          -------------+------------------------------           F(  3,    70) =   46.73
                 Model |  1629.67805     3  543.226016           Prob > F      =  0.0000
              Residual |  813.781411    70  11.6254487           R-squared     =  0.6670
          -------------+------------------------------           Adj R-squared =  0.6527
                 Total |  2443.45946    73  33.4720474           Root MSE      =  3.4096
          
          ------------------------------------------------------------------------------
                   mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                weight |   -.006139   .0007949    -7.72   0.000    -.0077245   -.0045536
            gear_ratio |   1.457113   1.541286     0.95   0.348    -1.616884     4.53111
               foreign |  -2.221682   1.234961    -1.80   0.076    -4.684735    .2413715
                 _cons |   36.10135   6.285984     5.74   0.000     23.56435    48.63835
          ------------------------------------------------------------------------------
          
          . bootstrap, reps(100) saving(C:\Users\Carlo Lazzaro\Desktop\bootstrap.dta, replace) seed(1) : regress mpg weight gear foreign
          (running regress on estimation sample)
          (note: file C:\Users\Carlo Lazzaro\Desktop\bootstrap.dta not found)
          
          Bootstrap replications (100)
          ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
          ..................................................    50
          ..................................................   100
          
          Linear regression                               Number of obs      =        74
                                                          Replications       =       100
                                                          Wald chi2(3)       =    111.96
                                                          Prob > chi2        =    0.0000
                                                          R-squared          =    0.6670
                                                          Adj R-squared      =    0.6527
                                                          Root MSE           =    3.4096
          
          ------------------------------------------------------------------------------
                       |   Observed   Bootstrap                         Normal-based
                   mpg |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
                weight |   -.006139   .0006498    -9.45   0.000    -.0074127   -.0048654
            gear_ratio |   1.457113   1.297786     1.12   0.262    -1.086501    4.000727
               foreign |  -2.221682   1.162728    -1.91   0.056    -4.500587    .0572236
                 _cons |   36.10135    4.71779     7.65   0.000     26.85465    45.34805
          ------------------------------------------------------------------------------
          . use "C:\Users\Carlo Lazzaro\Desktop\bootstrap.dta", clear
          (bootstrap: regress)
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Thanks a lot, @Carlo. It's working, but isnt it just a lof of beta estimates, it saves? Please see attached:
            Click image for larger version

Name:	Screen.png
Views:	1
Size:	269.8 KB
ID:	1305682

            I was more thinking of a "new" dataset. For example, if my data for gear was:
            1, 10, 12, 13

            I would now be:
            1, 10, 12, 13, 23, 2, 17, 4, 2...

            - the same goes for my other variables. Does it make sense?

            Comment


            • #7
              }
              2) Isnt it possible to save the "bootstrapped" dataset of, for example, 2.000 reps, i.e. the simulated data?
              I would really like this, because I find it easier to do hypothesis testing, etc. if I have the "new" dataset.
              It seems that what you want is a dataset with each one of the bootstrapped copies of the original data, as opposed to what Carlo provided, which are the beta estimates from each of the replication datasets. While this should be possible, I don't immediately see the usefulness of having this dataset. It will be a large dataset, with N*reps observation, where N is the number of observations and rep is the number of repetitions.

              Can you provide an example of a hypothesis test that would be easier if you had this dataset in memory?

              Jorge Eduardo Pérez Pérez
              www.jorgeperezperez.com

              Comment


              • #8
                Jorge Eduardo Perez Perez : For example, how do I perform one-way ANOVA on the bootstrapped "dataset"? (I have a country-variable in my original dataset)

                Comment


                • #9
                  To avoid confusion, lets call the dataset with the copies of the data for each bootstrap replication, the "replication dataset" and the dataset with the beta estimates, "the beta dataset"

                  It is not clear to me why you would want to perform any kind of analysis on the replications dataset as opposed to the beta dataset, so I can't answer that question unless you provide more details. Each replication is a sample with replacement of the original dataset. The only purpose of these copies is to obtain new beta estimates of your original estimation command, in order to obtain standard errors of these betas. Running an analysis on the full replication dataset seems meaningless. Moreover, this replication dataset changes with the seed of your random draws.

                  Here's some code to obtain the replication dataset, however:

                  Code:
                  sysuse auto, clear
                  glo reps=10
                  reg price mpg, vce(bootstrap, reps($reps) seed(350))
                  * Label this original dataset repetition 0
                  gen rep=0
                  * The "bootstrapped dataset" that originates this the chain of copies of the original dataset
                  * Each copy will be labeled by its replication number in rep
                  set seed 350
                  forv i=1(1)$reps {
                      preserve
                      sysuse auto, clear
                      bsample
                      gen rep=`i'
                      tempfile b
                      save `b'
                      restore
                      append using `b'
                  }
                  Last edited by Jorge Eduardo Perez Perez; 11 Aug 2015, 10:08.
                  Jorge Eduardo Pérez Pérez
                  www.jorgeperezperez.com

                  Comment


                  • #10
                    Jorge Eduardo Perez Perez: Thanks for providing the code, really appreciate. My problem is that I can't figure out the difference between the data inbuilt bootstrap function and the resample residuals code (i.e. the difference between 1) and 3) above).

                    When I use the code for the resampled residuals I only get the beta-estimates but not the t-statistics, i.e. I don't know if the beta estimates are significant.

                    So, in general would you just suggest the inbuilt stata command or the resampled residuals approach? Or does it depends on the purpose of the estimation?

                    Comment


                    • #11
                      The bootstrap implemented in Stata is the "pairs" or "design matrix" bootstrap, where the whole data is resampled, as opposed to the "residuals" bootstrap, where only the residuals are resampled and reassigned to the original data observations.

                      You may want to look at section 13.2 of the Microeconometrics using Stata textbook you referenced. The residual bootstrap makes assumptions about the model, such as linearity and i.i.d errors in your example, whereas the pairs bootstrap does not make these assumptions. Either can be appropriate depending on the application.

                      Jorge Eduardo Pérez Pérez
                      www.jorgeperezperez.com

                      Comment


                      • #12
                        Jorge Eduardo Perez Perez: Thanks. I had some time to look into it. However, I still dont know when to use either the resampling or the "pairs" bootstrap . Can you provide an example?

                        Comment


                        • #13
                          Here's a simple example: heteroskedasticity. Under heteroskedasticity, the residual bootstrap standard errors are not as close to the correct robust standard errors, because residuals with high error variance may be assigned to observations with low error variance. This example shows that in this setting, pairs bootstrap standard errors are closer to the correct robust standard errors:

                          Code:
                          clear
                          * Generate example data
                          set seed 946
                          set obs 100
                          gen x=uniform()
                          * Heteroskedasticity: variance of error is larger for 51/100
                          gen y=x+0.1*rnormal() in 1/50
                          replace y=x+0.5*rnormal() in 51/100
                          
                          * OLS s.e are not correct
                          reg y x
                          est store ols
                          * Save data residuals for later
                          predict res, resid
                          
                          preserve
                          keep res
                          tempfile res
                          save `res'
                          restore
                          
                          preserve
                          drop res
                          tempfile data
                          save `data'
                          restore
                          
                          * View heteroskedasticity
                          gen id=_n
                          gen ressq=res^2
                          scatter ressq id
                          est store ols
                          
                          * Should have robust se
                          reg y x, r
                          est store robust
                          
                          * Now we look at the se from both bootstrapping schemes
                          * See which is closer to robust s.e
                          
                          * Pairs bootstrap
                          reg y x, vce(bs, rep(400))
                          est store pairs
                          
                          * Residual bootstrap
                          cap program drop bootresidual
                          program bootresidual
                          version 11
                          use `1', clear
                          bsample
                          merge 1:1 _n using `2'
                          reg y x
                          predict xb
                          generate ystar=xb+res
                          reg ystar x
                          end
                          
                          * bootresidual `res' `data'
                          
                          **
                          simulate _b, reps (400) nodots: bootresidual `res' `data'
                          ren _b_x x
                          est restore ols
                          mat b=e(b)
                          bstat, stat(b)
                          est store residual
                          
                          
                          * Tab and compare estimates
                          est tab *, b se keep(x)
                          Jorge Eduardo Pérez Pérez
                          www.jorgeperezperez.com

                          Comment


                          • #14
                            Jorge Eduardo Perez Perez : Thank you so much for clearing this out and thank you for providing an example. I think it has been very hard to find answers somewhere else on this topic (or, if I was a Ph.D. it might have been easier )

                            Comment

                            Working...
                            X