Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Boostrap percentile confidence interval for survey data?

    We are using bootstrap variance estimation for survey data with a small number of primary sampling units (n=4), and would like to use the bootstrap percentiles to derive the confidence intervals rather than the normal approximation, which seems to be the default. We can't see a way to do that easily - any suggestions?

    We are using the rhsbsample command to generate our bootstrap replication weights, using repeated half-sample bootstrap sampling, and then specifying 'bootstrap' as the variance estimation method in our 'svyset' statement.

  • #2
    First off, a bit of ... aheum... self-promotion:rhsbsample is available from the SSC archive (ssc describe rhsbsample) and was described at the 2013 UKSUG meeting (see http://ideas.repec.org/p/boc/usug13/10.html).

    It does not seem quite immediate to get Stata to show you the percentile CI after -svy bootstrap-, but this mock example should get you there:

    Code:
    sysuse auto
    svyset [pw=weight] , bsrweight(mpg weight)    //  bsrweight() should contain your rhsbsample-generated replication weights
    svy bootstrap , saving(test , replace) mse : regress price headroom
    mat b=e(b)
    bstat using test , stat(b) mse
    mat list  e(ci_percentile)
    estat bootstrap , all
    The alternative of course is to do all calculations and combinations 'by hand'---whether it is easier probably depends on the size and complexity of your estimations.

    Philippe
    Last edited by P Van Kerm; 16 May 2014, 05:25.

    Comment


    • #3
      Thanks so much Philippe - that works fine. Much appreciated

      Helen

      Comment


      • #4
        Philippe: If I read your presentation correctly, the rhsbsample command takes repeated samples of size N/2 from N PSUs. Helen has N=4, so that the number of possible distinct samples is 6. Is this enough to get good bootstrap standard errors, let alone percentile CIs ?
        Last edited by Steve Samuels; 23 May 2014, 16:51.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Steve, is the problem that rhsbsampe uses N/2 as the size for the PSUs, or is the problem that Helen has only 4 observations and is trying to bootstrap? Because I believe the problem is the latter, and the PSU size set by rhsbsample is not a problem in this case, but your message seems to indicate that the problem is with the command.
          Alfonso Sanchez-Penalver

          Comment


          • #6
            Good point. If Helen indeed has only one stratum (and 4 PSUs), then , yes, this is quite a peculiar setting. In my recollection of Saigo et al. (Survey Methodology 2001) which rhsbsample implements, there is no condition on the number of stratum for the variance estimates to be ok (condition is on first-stage sample size). That said, if there is just one stratum, with 6 replications, Helen would have a complete enumeration of possible 'repeated half-samples', so the percentile bootstrap CI is likely to be quite problematic.

            Philippe

            Comment


            • #7
              Phillipe: I'm not very familiar with the theory in this area, but the little reading I've done (Hall, 2014, p. 81) for non-survey data agrees with your assessment.

              Alfonso, the problem is not the N/2, because the method will work fine with larger numbers of PSUs in one stratum or larger numbers of strata. In addition to the inaccurate percentile confidence intervals, the N = 4 PSUs is problematic also because it provides only three degrees of freedom. Korn and Graubard (1999, p. 193) discuss some work-arounds for such situations.

              Saigo et al.(2001) proposed repeated half-sample bootstrap sampling specifically to handle randomly imputed data.
              Helen, if you need it for that purpose, then I don't see a good choice for you.. If you don't have imputed data, then perhaps you can get by with an ordinary linearized (non-bootstrap) standard error and (wide) t-confidence intervals.

              References:

              Hall, Peter. 2014. Methodology and Theory for the Bootstrap, lecture notes
              found at http://anson.ucdavis.edu/~peterh/sta...to-may-16.pdf.

              Korn, Edward Lee, and Barry I Graubard. 1999. Analysis of health surveys. New York: Wiley.

              Saigo, Hiroshi, Jun Shao, and Randy R Sitter. 2001. A repeated half-sample bootstrap and balanced repeated replications for randomly imputed data. Survey Methodology 27, no. 2: 189-196. available at: http://www.statcan.gc.ca/ads-annonce...x/6095-eng.pdf
              Last edited by Steve Samuels; 25 May 2014, 18:20.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                I've read, the paper by Saigo et al. (2001) more carefully, If Helen has missing data and randomly re-imputes for each bootstrap replication as in the Saigo et al. article, she can use rhsbsample, though I think that any CIs will still be quite inaccurate, especially for nonlinear estimates. There will be > 6 distinct replicate values for each estimated parameter, located in six clusters, one for each distinct bootstrap sample. As Saigo et al. show (section 4, p. 192) Helen would need to use the average of the replication estimates (in e(b_bs)), not the reported usual estimates (e(b)). Saigo et al. show that their technique gives good results for n as small as 2 in each stratum. But in this case, the technique benefits from averaging over many strata (32 in their simulations).
                Last edited by Steve Samuels; 26 May 2014, 09:04.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Thanks all, would this work in a situation with 4 stratum and 2 primary sampling units in each stratum? Joel

                  Comment


                  • #10
                    I don't know. I suggest that you read section 5.2 of the Korn and Graubard reference. With L = 4 strata and \(n_h\) = 2 PSUs in each, the nominal design degrees of freedom is 4 for single parameter problems, but the effective degrees of freedom could be less.
                    Steve Samuels
                    Statistical Consulting
                    [email protected]

                    Stata 14.2

                    Comment

                    Working...
                    X