Bootsrapping and resampling with Stata

Kurt Heisler

Join Date: Jul 2014

Posts: 16
#1

Bootsrapping and resampling with Stata

05 Aug 2014, 13:39

I have three years of data for about 10 schools.

Code:

School Yr1 Yr2 Yr3 A .70 .75 .72 B .50 .46 .48 ... J .60 .61 .60

For each school, I would like to:

1. Take the three years of values and sample with replacement to obtain a resample of 30 values.

2. Calculate a mean and SD of these 30 values.

3. Repeat the bootstrap process 1,000 times to create 1,000 sample means and 1,000 sample SDs.

I have never used the Stata bootstrapping features so am unsure if this the solution is complex or rather straightforward.

Any suggestions? Thank you.
Tags: None
ericmelse

Join Date: May 2014

Posts: 436
#2

06 Aug 2014, 04:29

Dear Kurt,

Maybe I understand you wrong, but your data set has only 30 values (i.e. 10 schools x 3 years = 30 values?).
So, how can we draw 1,000 samples of 30 values?
To have 1,000 (different) samples of 30 values, certainly, the data set has to be substantially larger.

Actually, the Stata syntax is not that complicated to draw 1,000 (different) samples for model derivation (and validation).
Computing sample means and SDs is also straightforward.

See the attached Auto sample test.do file.
I have included two lines to either select by percentage (e.g. 80-20%), or by any given number of cases to be selected (e.g. 30).

Best regards,
Eric Melse

Attached Files

Auto sample test.do (3.2 KB, 1 view)

http://publicationslist.org/eric.melse
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4493
#3

06 Aug 2014, 06:09

Kurt, have you read the help file for -bootstrap-; start there

Eric, you seem not to understand bootstrapping so you might want to start there also
Comment
Kurt Heisler

Join Date: Jul 2014

Posts: 16
#4

18 Aug 2014, 11:45

Originally posted by Rich Goldstein View Post

Kurt, have you read the help file for -bootstrap-; start there

I did, and simply could not figure out the syntax for this. The examples it provides are all quite basic, and all assume you have one value to bootstrap.

I thought maybe the cluster argument is where I would specify the three variables Yr1 Yr2 Yr3, but that still leaves me wondering what I use for the initial bootstrap argument:

Code:

bootstrap ??? rep(1000) seed(123) cluster(Yr1 Yr2 Yr3) idcluster(YearValues)
Comment

Jeff Pitblado (StataCorp)

StataCorp Employee

Join Date: Mar 2014
Posts: 707

18 Aug 2014, 15:47

I find it very helpful when there is a workable dataset to play with.

Thankfully, Kurt gave us most of the information we need to build one.

I've constructed a dataset loosely based on Kurt's original post:

Code:

input str1 School Yr1 Yr2 Yr3
"A" .70 .75 .72
"B" .50 .46 .48
"C" .82 .78 .84
"D" .46 .55 .62
"E" .98 .95 .92
"F" .64 .73 .68
"G" .72 .74 .76
"H" .81 .83 .85
"I" .93 .92 .91
"J" .60 .61 .60
end

I would consider this a "wide" dataset; I assume School is something
like a panel variable and we have a separate variable containing some form of
score for each school at three points in time: Yr1, Yr2, Yr3.
If we are really interested in an overall mean and standard deviation (SD) of
these 30 values, then I would recommend reshaping the data.

Here is how I reshaped the data:

Code:

. reshape long Yr, i(School) j(year)
(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       10   ->      30
Number of variables                   4   ->       3
j variable (3 values)                     ->   year
xij variables:
                            Yr1 Yr2 Yr3   ->   Yr
-----------------------------------------------------------------------------

. rename Yr Score

. list

     +-----------------------+
     | School   year   Score |
     |-----------------------|
  1. |      A      1      .7 |
  2. |      A      2     .75 |
  3. |      A      3     .72 |
  4. |      B      1      .5 |
  5. |      B      2     .46 |
     |-----------------------|
  6. |      B      3     .48 |
  7. |      C      1     .82 |
  8. |      C      2     .78 |
  9. |      C      3     .84 |
 10. |      D      1     .46 |
     |-----------------------|
 11. |      D      2     .55 |
 12. |      D      3     .62 |
 13. |      E      1     .98 |
 14. |      E      2     .95 |
 15. |      E      3     .92 |
     |-----------------------|
 16. |      F      1     .64 |
 17. |      F      2     .73 |
 18. |      F      3     .68 |
 19. |      G      1     .72 |
 20. |      G      2     .74 |
     |-----------------------|
 21. |      G      3     .76 |
 22. |      H      1     .81 |
 23. |      H      2     .83 |
 24. |      H      3     .85 |
 25. |      I      1     .93 |
     |-----------------------|
 26. |      I      2     .92 |
 27. |      I      3     .91 |
 28. |      J      1      .6 |
 29. |      J      2     .61 |
 30. |      J      3      .6 |
     +-----------------------+

Now our use of the bootstrap command depends on how we wish
to resample the dataset.

The bootstrap syntax for simple random sampling is

Code:

bootstrap mean=r(mean) sd=r(sd), seed(123) rep(1000) : sum Score

If we want to cluster sample the schools, the syntax is

Code:

bootstrap mean=r(mean) sd=r(sd), seed(123) rep(1000) cluster(School) : sum Score

there are a total of 10^10 possible bootstrap sample of this kind.

If we want to cluster sample the years, the syntax is

Code:

bootstrap mean=r(mean) sd=r(sd), seed(123) rep(1000) cluster(year) : sum Score

there are a total of 3^3=27 possible bootstrap samples of this kind, so 1000
replications might be overkill in this case.

Announcement

Bootsrapping and resampling with Stata

Comment

Comment

Comment

Comment