Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • svy bootstrap: anything wrong with it?

    Hi,

    I am spending a good amount of time to succesfully run 'svy bootstrap', but I can't manage to do it.

    I am starting to think there might be something wrong with the command when using user-written programs.

    Here is a dummy example to make my point:

    Code:
    set more off
    use http://www.stata-press.com/data/r14/nhanes2f, clear
    
    set seed 0123456789
    svyset psuid [pweight=finalwgt], strata(stratid) psu(psuid)
    bsweights bw, reps(200) n(-1) dots replace /*to create bootstrap weights*/
    svyset psuid [pweight=finalwgt], strata(stratid) psu(psuid) bsrweight(bw*) /*to include bootstrap weights in svyset*/
    capture program drop savemargins
    program savemargins, eclass
        qui mlogit health i.agegrp##c.zinc [pw=finalwgt], baseoutcome(1) /*this produces the same point estimates of the command above*/
    end
    
    svy bootstrap _b: mlogit health i.agegrp##c.zinc, baseoutcome(1)
    svy bootstrap _b: savemargins now
    I'd execpt these two 'svy bootstrap' commands to return the same output, but the command

    Code:
    svy bootstrap _b: savemargins now
    doesn't estimate standard errors. This is the output you get from these two commands:

    Code:
    svy bootstrap _b: mlogit health i.agegrp##c.zinc, baseoutcome(1)
    (running mlogit on estimation sample)
     
    Bootstrap replications (200)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..................................................    50
    ..................................................   100
    ..................................................   150
    ..................................................   200
     
    Survey: Multinomial logistic regression          Number of obs   =       9,188
                                                     Population size = 104,162,204
                                                     Replications    =         200
                                                     Wald chi2(44)   =    24481.19
                                                     Prob > chi2     =      0.0000
     
    -------------------------------------------------------------------------------
                  |   Observed   Bootstrap                         Normal-based
           health | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
    poor          |  (base outcome)
    --------------+----------------------------------------------------------------
    fair          |
           agegrp |
        age30-39  |   .6307591   2.784013     0.23   0.821    -4.825807    6.087325
        age40-49  |   .1858713    2.58411     0.07   0.943    -4.878892    5.250634
        age50-59  |  -2.285295   2.543675    -0.90   0.369    -7.270807    2.700217
        age60-69  |  -1.267306   2.125213    -0.60   0.551    -5.432647    2.898035
         age 70+  |   .0880936   2.255595     0.04   0.969    -4.332792    4.508979
                  |
             zinc |   .0030401   .0241923     0.13   0.900    -.0443758    .0504561
                  |
    agegrp#c.zinc |
        age30-39  |  -.0111744    .030755    -0.36   0.716    -.0714531    .0491042
        age40-49  |  -.0114104   .0284998    -0.40   0.689    -.0672689    .0444481
        age50-59  |   .0162348   .0291624     0.56   0.578    -.0409224    .0733919
        age60-69  |     .00372   .0236476     0.16   0.875    -.0426284    .0500684
         age 70+  |  -.0123845    .025922    -0.48   0.633    -.0631907    .0384218
                  |
            _cons |   1.467004   2.154797     0.68   0.496    -2.756321    5.690329
    --------------+----------------------------------------------------------------
    average       |
           agegrp |
        age30-39  |  -1.013213   3.008141    -0.34   0.736    -6.909061    4.882635
        age40-49  |  -1.474417   2.662485    -0.55   0.580    -6.692792    3.743957
        age50-59  |  -4.576451   2.427487    -1.89   0.059    -9.334237    .1813353
        age60-69  |  -2.953667   2.391359    -1.24   0.217    -7.640645    1.733312
         age 70+  |  -2.122265   2.516095    -0.84   0.399    -7.053721    2.809192
                  |
             zinc |  -.0021463   .0259741    -0.08   0.934    -.0530547     .048762
                  |
    agegrp#c.zinc |
        age30-39  |   .0070231   .0336488     0.21   0.835    -.0589273    .0729736
        age40-49  |   .0014442   .0297591     0.05   0.961    -.0568826    .0597711
        age50-59  |   .0351228    .027244     1.29   0.197    -.0182744      .08852
        age60-69  |   .0103388   .0270509     0.38   0.702    -.0426799    .0633575
         age 70+  |  -.0032432   .0291694    -0.11   0.911    -.0604142    .0539279
                  |
            _cons |   3.336288   2.289978     1.46   0.145    -1.151986    7.824562
    --------------+----------------------------------------------------------------
    good          |
           agegrp |
        age30-39  |  -.4318196   2.692593    -0.16   0.873    -5.709205    4.845565
        age40-49  |  -2.015888   2.599128    -0.78   0.438    -7.110086     3.07831
        age50-59  |  -5.141381   2.537319    -2.03   0.043    -10.11443   -.1683265
        age60-69  |  -3.859092   2.168088    -1.78   0.075    -8.108467    .3902834
         age 70+  |  -2.179386   2.548777    -0.86   0.393    -7.174897    2.816125
                  |
             zinc |   .0069424   .0248134     0.28   0.780    -.0416909    .0555757
                  |
    agegrp#c.zinc |
        age30-39  |  -.0020289   .0305298    -0.07   0.947    -.0618662    .0578084
        age40-49  |   .0013358    .028949     0.05   0.963    -.0554031    .0580747
        age50-59  |   .0319356   .0288228     1.11   0.268    -.0245561    .0884273
        age60-69  |   .0116608   .0246166     0.47   0.636    -.0365869    .0599084
         age 70+  |    -.01292   .0291148    -0.44   0.657     -.069984     .044144
                  |
            _cons |   2.965805   2.177808     1.36   0.173    -1.302619     7.23423
    --------------+----------------------------------------------------------------
    excellent     |
           agegrp |
        age30-39  |  -.4955834   2.828905    -0.18   0.861    -6.040135    5.048968
        age40-49  |  -1.965187   2.680753    -0.73   0.464    -7.219366    3.288992
        age50-59  |  -4.839694   2.432614    -1.99   0.047    -9.607529    -.071859
        age60-69  |  -3.345652    2.29151    -1.46   0.144     -7.83693    1.145625
         age 70+  |  -3.280243   2.783724    -1.18   0.239    -8.736242    2.175755
                  |
             zinc |   .0106708   .0250371     0.43   0.670     -.038401    .0597426
                  |
    agegrp#c.zinc |
        age30-39  |  -.0013921   .0317513    -0.04   0.965    -.0636235    .0608394
        age40-49  |   .0018209   .0296879     0.06   0.951    -.0563663    .0600081
        age50-59  |   .0257879    .027434     0.94   0.347    -.0279818    .0795575
        age60-69  |  -.0013309   .0255839    -0.05   0.959    -.0514744    .0488125
         age 70+  |  -.0020064   .0320969    -0.06   0.950    -.0649152    .0609024
                  |
            _cons |   2.694864   2.207606     1.22   0.222    -1.631964    7.021692
    -------------------------------------------------------------------------------
     
    svy bootstrap _b: savemargins now
    (running savemargins on estimation sample)
     
    Bootstrap replications (200)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..................................................    50
    ..................................................   100
    ..................................................   150
    ..................................................   200
     
    Multinomial logistic regression                  Number of obs   =      10,337
                                                     Population size = 117,023,659
                                                     Replications    =         200
                                                     Wald chi2(0)    =           .
                                                     Prob > chi2     =           .
     
    -------------------------------------------------------------------------------
                  |   Observed   Bootstrap                         Normal-based
           health | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
    poor          |  (base outcome)
    --------------+----------------------------------------------------------------
    fair          |
           agegrp |
        age30-39  |   .6307591          .        .       .            .           .
        age40-49  |   .1858713          .        .       .            .           .
        age50-59  |  -2.285295          .        .       .            .           .
        age60-69  |  -1.267306          .        .       .            .           .
         age 70+  |   .0880936          .        .       .            .           .
                  |
             zinc |   .0030401          .        .       .            .           .
                  |
    agegrp#c.zinc |
        age30-39  |  -.0111744          .        .       .            .           .
        age40-49  |  -.0114104          .        .       .            .           .
        age50-59  |   .0162348          .        .       .            .           .
        age60-69  |     .00372          .        .       .            .           .
         age 70+  |  -.0123845          .        .       .            .           .
                  |
            _cons |   1.467004          .        .       .            .           .
    --------------+----------------------------------------------------------------
    average       |
           agegrp |
        age30-39  |  -1.013213          .        .       .            .           .
        age40-49  |  -1.474417          .        .       .            .           .
        age50-59  |  -4.576451          .        .       .            .           .
        age60-69  |  -2.953667          .        .       .            .           .
         age 70+  |  -2.122265          .        .       .            .           .
                  |
             zinc |  -.0021463          .        .       .            .           .
                  |
    agegrp#c.zinc |
        age30-39  |   .0070231          .        .       .            .           .
        age40-49  |   .0014442          .        .       .            .           .
        age50-59  |   .0351228          .        .       .            .           .
        age60-69  |   .0103388          .        .       .            .           .
         age 70+  |  -.0032432          .        .       .            .           .
                  |
            _cons |   3.336288          .        .       .            .           .
    --------------+----------------------------------------------------------------
    good          |
           agegrp |
        age30-39  |  -.4318196          .        .       .            .           .
        age40-49  |  -2.015888          .        .       .            .           .
        age50-59  |  -5.141381          .        .       .            .           .
        age60-69  |  -3.859092          .        .       .            .           .
         age 70+  |  -2.179386          .        .       .            .           .
                  |
             zinc |   .0069424          .        .       .            .           .
                  |
    agegrp#c.zinc |
        age30-39  |  -.0020289          .        .       .            .           .
        age40-49  |   .0013358          .        .       .            .           .
        age50-59  |   .0319356          .        .       .            .           .
        age60-69  |   .0116608          .        .       .            .           .
         age 70+  |    -.01292          .        .       .            .           .
                  |
            _cons |   2.965805          .        .       .            .           .
    --------------+----------------------------------------------------------------
    excellent     |
           agegrp |
        age30-39  |  -.4955834          .        .       .            .           .
        age40-49  |  -1.965187          .        .       .            .           .
        age50-59  |  -4.839694          .        .       .            .           .
        age60-69  |  -3.345652          .        .       .            .           .
         age 70+  |  -3.280243          .        .       .            .           .
                  |
             zinc |   .0106708          .        .       .            .           .
                  |
    agegrp#c.zinc |
        age30-39  |  -.0013921          .        .       .            .           .
        age40-49  |   .0018209          .        .       .            .           .
        age50-59  |   .0257879          .        .       .            .           .
        age60-69  |  -.0013309          .        .       .            .           .
         age 70+  |  -.0020064          .        .       .            .           .
                  |
            _cons |   2.694864          .        .       .            .           .
    -------------------------------------------------------------------------------
    Why is this happening?
    ------
    I use Stata 17

  • #2
    Hi,

    I think I have found a workaround to the issue mentioned above, which would be anyway desirable for Stata people to look into if possible.

    I have further simplified the code just to make things easier:

    Code:
    set more off
    use http://www.stata-press.com/data/r14/nhanes2f, clear
    
    /* manual bootstrap procedure for complex survey data */
    
    *Step 1: save observed coefficients
    preserve
    qui reg health i.agegrp##c.zinc [pw=finalwgt]
    matrix beta=e(b)
    global N "`e(N)'"
    restore
    
    *Step 2: generate program for bootstrap
    capture program drop savemargins
    program savemargins, eclass properties(bsvy)
    
        preserve
        
        gen bw=.
        rhsbsample, strata(stratid) cluster(psuid) weight(bw)
        qui reg health i.agegrp##c.zinc [pw=bw]
        matrix beta_boot=e(b)
        forvalues i=1/14 {
            scalar beta_boot_`i'=beta_boot[1,`i']
        }
        restore
    end
    
    *Step 3: run bootstrap
    preserve
    simulate    beta_boot_1=beta_boot[1,1] beta_boot_2=beta_boot[1,2] beta_boot_3=beta_boot[1,3] beta_boot_4=beta_boot[1,4] beta_boot_5=beta_boot[1,5] ///
                beta_boot_6=beta_boot[1,6] beta_boot_7=beta_boot[1,7] beta_boot_8=beta_boot[1,8] beta_boot_9=beta_boot[1,9] beta_boot_10=beta_boot[1,10] ///
                beta_boot_11=beta_boot[1,11] beta_boot_12=beta_boot[1,12] beta_boot_13=beta_boot[1,13] beta_boot_14=beta_boot[1,14], ///
                reps(200) seed(0123456789): savemargins
    
    *Step 4: estimate bootstrap CIs
    bstat, stat(beta) n(${N})
    restore
    
    /* bootstrap command for complex survey data */
    preserve
    set seed 0123456789
    forvalues i=1/200 {
        capture drop bw`i'
        qui gen bw`i' = .
        qui rhsbsample , strata(stratid) cluster(psuid) weight(bw`i')
        }
    svyset psuid [pweight=finalwgt], strata(stratid) psu(psuid) bsrweight(bw*)
    svy bootstrap _b: reg health i.agegrp##c.zinc
    restore
    These are the results that you get:

    Code:
    /* manual bootstrap procedure for complex survey data */
    
    Simulations (200)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..................................................    50
    ..................................................   100
    ..................................................   150
    ..................................................   200
    
    Bootstrap results                                        Number of obs = 9,188
                                                             Replications  =   200
    
    ------------------------------------------------------------------------------
                 |   Observed   Bootstrap                         Normal-based
                 | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
     beta_boot_1 |          0  (omitted)
     beta_boot_2 |  -.0604953   .2155528    -0.28   0.779     -.482971    .3619803
     beta_boot_3 |  -.6800862   .1965283    -3.46   0.001    -1.065275   -.2948977
     beta_boot_4 |   -1.18574   .2813968    -4.21   0.000    -1.737267   -.6342122
     beta_boot_5 |  -1.030735   .1835159    -5.62   0.000     -1.39042   -.6710505
     beta_boot_6 |  -1.082403   .4134719    -2.62   0.009    -1.892793   -.2720135
     beta_boot_7 |   .0039468   .0015131     2.61   0.009     .0009811    .0069125
     beta_boot_8 |          0  (omitted)
     beta_boot_9 |   -.000522   .0024129    -0.22   0.829    -.0052512    .0042071
    beta_boot_10 |   .0034892   .0020696     1.69   0.092    -.0005671    .0075455
    beta_boot_11 |   .0056021    .002976     1.88   0.060    -.0002307     .011435
    beta_boot_12 |   .0006459   .0020993     0.31   0.758    -.0034686    .0047604
    beta_boot_13 |   .0003931   .0048961     0.08   0.936    -.0092032    .0099893
    beta_boot_14 |   3.653983    .151776    24.07   0.000     3.356507    3.951458
    ------------------------------------------------------------------------------
    
    
    /* bootstrap command for complex survey data */
    
    Bootstrap replications (200)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    ..................................................    50
    ..................................................   100
    ..................................................   150
    ..................................................   200
    
    Survey: Linear regression                        Number of obs   =       9,188
                                                     Population size = 104,162,204
                                                     Replications    =         200
                                                     Wald chi2(11)   =     1071.82
                                                     Prob > chi2     =      0.0000
                                                     R-squared       =      0.1157
    
    -------------------------------------------------------------------------------
                  |   Observed   Bootstrap                         Normal-based
           health | coefficient  std. err.      z    P>|z|     [95% conf. interval]
    --------------+----------------------------------------------------------------
           agegrp |
        age30-39  |  -.0604953   .2150132    -0.28   0.778    -.4819134    .3609228
        age40-49  |  -.6800862   .1960364    -3.47   0.001     -1.06431   -.2958619
        age50-59  |   -1.18574   .2806924    -4.22   0.000    -1.735887   -.6355927
        age60-69  |  -1.030735   .1830565    -5.63   0.000    -1.389519   -.6719508
         age 70+  |  -1.082403   .4124369    -2.62   0.009    -1.890765    -.274042
                  |
             zinc |   .0039468   .0015093     2.61   0.009     .0009885    .0069051
                  |
    agegrp#c.zinc |
        age30-39  |   -.000522   .0024068    -0.22   0.828    -.0052393    .0041952
        age40-49  |   .0034892   .0020644     1.69   0.091    -.0005569    .0075353
        age50-59  |   .0056021   .0029686     1.89   0.059    -.0002161    .0114204
        age60-69  |   .0006459    .002094     0.31   0.758    -.0034583    .0047501
         age 70+  |   .0003931   .0048839     0.08   0.936    -.0091792    .0099653
                  |
            _cons |   3.653983    .151396    24.14   0.000     3.357252    3.950713
    -------------------------------------------------------------------------------

    Hope this helps,

    Lukas
    ------
    I use Stata 17

    Comment

    Working...
    X