Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using bootstrap and svy commands together?

    Hi everyone,
    I am hoping to get some advice on the use of boostrap with svy procedures. Below are two contrived examples using public use nhanes data. I have a more complicated multi-step procedure that I am actually trying to program. Can anyone explain whether using bootstrap with svy commands as I do in example 1 is problematic? If so, is example 2 the way to go? Thanks in advance. I am using STATA 14.1 MP on a Linux operating system.


    . **EXAMPLE 1: USING BOOSTRAP WITH SVY PROCEDURE AND NO POSTESTIMATION
    Code:
    webuse nhanes2
    . gen obese=0
    . replace obese=1 if(bmi>=30)
    . gen xrace=race
    . svyset psu [pw=finalwgt], strata(strata)
    . capture program drop myprog
     
    . program define myprog, eclass
      1.         preserve
      2.         svy: logistic obese i.race age i.rural i.region
      3.         restore
      4. end
     
    . bootstrap _b, seed(10209) reps(5): myprog
    . **EXAMPLE 2: USING SVY BOOSTRAP WITH REGULAR PROCEDURES AND NO POSTESTIMATION
    .
    Code:
    clear
    . webuse nhanes2
    . gen obese=0
    . replace obese=1 if(bmi>=30)
    . gen xrace=race
    . svyset psu [pw=finalwgt], strata(strata)
    . bsweights bw, reps(5) n(0) seed(10209)
     
    . svyset [pw=finalwgt], bsrweight(bw*) vce(bootstrap)
    . svy bootstrap _b: logistic obese i.race age i.rural i.region


    Example 1 produces the following:
    Bootstrap replications (5)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    .....

    Survey: Logistic regression

    Number of strata = 31 Number of obs = 10,351
    Replications = 5
    Wald chi2(4) = .
    Prob > chi2 = .

    ------------------------------------------------------------------------------
    | Observed Bootstrap Normal-based
    obese | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    race |
    Black | 2.109698 .1831897 8.60 0.000 1.779544 2.501106
    Other | .7850331 .0386949 -4.91 0.000 .7127407 .8646579
    |
    age | 1.018213 .0012068 15.23 0.000 1.015851 1.020581
    1.rural | 1.248806 .0253599 10.94 0.000 1.200077 1.299512
    |
    region |
    MW | 1.004325 .142525 0.03 0.976 .7604652 1.326385
    S | 1.00868 .1456812 0.06 0.952 .760005 1.338722
    W | .9234463 .1506406 -0.49 0.625 .670743 1.271356
    |
    _cons | .0656821 .0086588 -20.66 0.000 .0507264 .0850472
    ------------------------------------------------------------------------------


    Example 2 produces the following:
    Bootstrap replications (5)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    .....

    Survey: Logistic regression Number of obs = 10,351
    Population size = 117,157,513
    Replications = 5
    Wald chi2(4) = .
    Prob > chi2 = .

    ------------------------------------------------------------------------------
    | Observed Bootstrap Normal-based
    obese | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    race |
    Black | 2.109698 .2431076 6.48 0.000 1.683191 2.644278
    Other | .7850331 .1873516 -1.01 0.311 .4917506 1.25323
    |
    age | 1.018213 .0015534 11.83 0.000 1.015173 1.021262
    1.rural | 1.248806 .08302 3.34 0.001 1.096244 1.422598
    |
    region |
    MW | 1.004325 .1187691 0.04 0.971 .7965506 1.266297
    S | 1.00868 .1738644 0.05 0.960 .7195042 1.414079
    W | .9234463 .0985782 -0.75 0.456 .7491099 1.138355
    |
    _cons | .0656821 .0098427 -18.17 0.000 .0489656 .0881054
    ------------------------------------------------------------------------------




  • #2
    Lucy:
    surely off-target here, but:
    Code:
    replace obese=1 if(bmi>=30)
    gt;=30 invalid name
    r(198);
    is apparently illegal syntax.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo,
      Sorry for not noticing the syntax that became illegal in the copy and paste.

      **EXAMPLE 1: USING BOOSTRAP WITH SVY PROCEDURE AND NO POSTESTIMATION

      Code:
      webuse nhanes2
      gen obese=0
      replace obese=1 if(bmi>=30)
      svyset psu [pw=finalwgt], strata(strata)
      capture program drop myprog
      
      program define myprog, eclass
               preserve
               svy: logistic obese i.race age i.rural i.region
               restore
      end
      
      bootstrap _b, seed(10209) reps(5):  myprog
      **EXAMPLE 2: USING SVY BOOSTRAP WITH REGULAR PROCEDURES AND NO POSTESTIMATION
      Code:
      webuse nhanes2
      gen obese=0
      replace obese=1 if(bmi>=30)
      svyset psu [pw=finalwgt], strata(strata)
      bsweights bw, reps(5) n(0) seed(10209)
      
      svyset [pw=finalwgt], bsrweight(bw*) vce(bootstrap)
      svy bootstrap _b: logistic obese i.race age i.rural i.region
      Thanks.
      -L

      Last edited by Lucy Bilaver; 20 Mar 2017, 12:47.

      Comment


      • #4
        Lucy:
        the problem is easily fixed via increasing the number of bootstrap replications (as you can see by clicking on the hyperlink Wald chi2(4) = .):
        Code:
        . bootstrap _b, seed(10209) reps(10):  myprog
        (running myprog on estimation sample)
        
        Warning:  Because myprog is not an estimation command or does not set e(sample), bootstrap has no way
                  to determine which observations are used in calculating the statistics and so assumes that
                  all observations are used.  This means that no observations will be excluded from the
                  resampling because of missing values or other reasons.
        
                  If the assumption is not true, press Break, save the data, and drop the observations that
                  are to be excluded.  Be sure that the dataset in memory contains only the relevant data.
        
        Bootstrap replications (10)
        ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
        ..........
        
        Survey: Logistic regression
        
        Number of strata   =        31                  Number of obs     =     10,351
                                                        Replications      =         10
                                                        Wald chi2(7)      =     251.03
                                                        Prob > chi2       =     0.0000
        
        ------------------------------------------------------------------------------
                     |   Observed   Bootstrap                         Normal-based
               obese | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
        -------------+----------------------------------------------------------------
                race |
              Black  |   2.109698   .2307181     6.83   0.000     1.702677    2.614017
              Other  |   .7850331   .1562568    -1.22   0.224     .5314479    1.159619
                     |
                 age |   1.018213   .0014219    12.93   0.000      1.01543    1.021004
             1.rural |   1.248806   .0568018     4.88   0.000     1.142294    1.365248
                     |
              region |
                 MW  |   1.004325   .1145338     0.04   0.970     .8031616    1.255873
                  S  |    1.00868   .1157363     0.08   0.940     .8055385     1.26305
                  W  |   .9234463   .1282731    -0.57   0.566     .7033538     1.21241
                     |
               _cons |   .0656821   .0084762   -21.10   0.000     .0510035    .0845851
        ------------------------------------------------------------------------------
        Set aside your case, I fail to remember any resampling with a minimum of 200 replications.
        Last edited by Carlo Lazzaro; 20 Mar 2017, 12:58.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thank you Carlo. I did not think that there was a problem and would certainly use more replications in practice. I was trying to confirm that using bootstrap in combination with svy commands is a reasonable thing to do. Aside from my simple example #2, I have found that svy bootstrap is very challenging to use. Do you have any advice on whether example 1 or example 2 is appropriate? Thanks for any advice.
          -L

          Comment


          • #6
            Lucy:
            I cannot say whether the user-written programme -bsweights- contributes to make your -bootstrap- procedure more precise.
            Provided that I did not test it, your example #2 takes the survey structure into account.
            If you find -bootstrap- challenging, you can try -jacknife-.
            As a closing out remark, my last statement in my previous reply should be read:
            Set aside your case, I fail to remember any resampling with a minimum of less than 200 replications
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment

            Working...
            X