Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bootsrapping and resampling with Stata

    I have three years of data for about 10 schools.

    Code:
    School Yr1 Yr2 Yr3
    A      .70 .75 .72
    B      .50 .46 .48
    ...
    J      .60 .61 .60
    For each school, I would like to:

    1. Take the three years of values and sample with replacement to obtain a resample of 30 values.

    2. Calculate a mean and SD of these 30 values.

    3. Repeat the bootstrap process 1,000 times to create 1,000 sample means and 1,000 sample SDs.

    I have never used the Stata bootstrapping features so am unsure if this the solution is complex or rather straightforward.

    Any suggestions? Thank you.

  • #2
    Dear Kurt,

    Maybe I understand you wrong, but your data set has only 30 values (i.e. 10 schools x 3 years = 30 values?).
    So, how can we draw 1,000 samples of 30 values?
    To have 1,000 (different) samples of 30 values, certainly, the data set has to be substantially larger.

    Actually, the Stata syntax is not that complicated to draw 1,000 (different) samples for model derivation (and validation).
    Computing sample means and SDs is also straightforward.

    See the attached Auto sample test.do file.
    I have included two lines to either select by percentage (e.g. 80-20%), or by any given number of cases to be selected (e.g. 30).

    Best regards,
    Eric Melse
    Attached Files
    http://publicationslist.org/eric.melse

    Comment


    • #3
      Kurt, have you read the help file for -bootstrap-; start there

      Eric, you seem not to understand bootstrapping so you might want to start there also

      Comment


      • #4
        Originally posted by Rich Goldstein View Post
        Kurt, have you read the help file for -bootstrap-; start there
        I did, and simply could not figure out the syntax for this. The examples it provides are all quite basic, and all assume you have one value to bootstrap.

        I thought maybe the cluster argument is where I would specify the three variables Yr1 Yr2 Yr3, but that still leaves me wondering what I use for the initial bootstrap argument:

        Code:
        bootstrap ??? rep(1000) seed(123) cluster(Yr1 Yr2 Yr3) idcluster(YearValues)





        Comment


        • #5
          I find it very helpful when there is a workable dataset to play with.

          Thankfully, Kurt gave us most of the information we need to build one.

          I've constructed a dataset loosely based on Kurt's original post:

          Code:
          input str1 School Yr1 Yr2 Yr3
          "A" .70 .75 .72
          "B" .50 .46 .48
          "C" .82 .78 .84
          "D" .46 .55 .62
          "E" .98 .95 .92
          "F" .64 .73 .68
          "G" .72 .74 .76
          "H" .81 .83 .85
          "I" .93 .92 .91
          "J" .60 .61 .60
          end
          I would consider this a "wide" dataset; I assume School is something
          like a panel variable and we have a separate variable containing some form of
          score for each school at three points in time: Yr1, Yr2, Yr3.
          If we are really interested in an overall mean and standard deviation (SD) of
          these 30 values, then I would recommend reshaping the data.

          Here is how I reshaped the data:

          Code:
          . reshape long Yr, i(School) j(year)
          (note: j = 1 2 3)
          
          Data                               wide   ->   long
          -----------------------------------------------------------------------------
          Number of obs.                       10   ->      30
          Number of variables                   4   ->       3
          j variable (3 values)                     ->   year
          xij variables:
                                      Yr1 Yr2 Yr3   ->   Yr
          -----------------------------------------------------------------------------
          
          . rename Yr Score
          
          . list
          
               +-----------------------+
               | School   year   Score |
               |-----------------------|
            1. |      A      1      .7 |
            2. |      A      2     .75 |
            3. |      A      3     .72 |
            4. |      B      1      .5 |
            5. |      B      2     .46 |
               |-----------------------|
            6. |      B      3     .48 |
            7. |      C      1     .82 |
            8. |      C      2     .78 |
            9. |      C      3     .84 |
           10. |      D      1     .46 |
               |-----------------------|
           11. |      D      2     .55 |
           12. |      D      3     .62 |
           13. |      E      1     .98 |
           14. |      E      2     .95 |
           15. |      E      3     .92 |
               |-----------------------|
           16. |      F      1     .64 |
           17. |      F      2     .73 |
           18. |      F      3     .68 |
           19. |      G      1     .72 |
           20. |      G      2     .74 |
               |-----------------------|
           21. |      G      3     .76 |
           22. |      H      1     .81 |
           23. |      H      2     .83 |
           24. |      H      3     .85 |
           25. |      I      1     .93 |
               |-----------------------|
           26. |      I      2     .92 |
           27. |      I      3     .91 |
           28. |      J      1      .6 |
           29. |      J      2     .61 |
           30. |      J      3      .6 |
               +-----------------------+
          Now our use of the bootstrap command depends on how we wish
          to resample the dataset.

          The bootstrap syntax for simple random sampling is
          Code:
          bootstrap mean=r(mean) sd=r(sd), seed(123) rep(1000) : sum Score
          If we want to cluster sample the schools, the syntax is
          Code:
          bootstrap mean=r(mean) sd=r(sd), seed(123) rep(1000) cluster(School) : sum Score
          there are a total of 10^10 possible bootstrap sample of this kind.

          If we want to cluster sample the years, the syntax is
          Code:
          bootstrap mean=r(mean) sd=r(sd), seed(123) rep(1000) cluster(year) : sum Score
          there are a total of 3^3=27 possible bootstrap samples of this kind, so 1000
          replications might be overkill in this case.

          Comment

          Working...
          X