bootstrapping medians in 2 groups fails with stsum, but not with summarize

Steve Samuels

Join Date: Mar 2014
Posts: 1786

bootstrapping medians in 2 groups fails with stsum, but not with summarize

08 Apr 2016, 10:36

I was trying to bootstrap medians in two groups for a survival analysis in order to eventually bootstrap their difference. I use stsum to find the medians in each group. I'm finding that bootstrap fails with an "insufficient observations" even if there is no censoring(program boot2, below) but succeeds when the command is sum, detail (program boot1, below). boot2 works if not bootstrapped. Any thoughts?

Code:

webuse catheter, clear
replace infect = 1  // no censoring
stset time, fail(infect)

cap program drop _all
/* sum */
program define boot1 , rclass
    sum time if female, det
    return scalar m1 = r(p50)
    sum time if !female, det
    return scalar m2 = r(p50)
end
/* stsum*/
program define boot2 , rclass
    stsum if female
    return scalar m1 = r(p50)
    stsum if !female
    return scalar m2 = r(p50)
end

bootstrap  m1 = r(m1) m2 =r(m2), ///
strata(female) reps(50): boot1

Number of strata   =         2                  Number of obs     =         76
                                                Replications      =         50

      command:  boot1
           m1:  r(m1)
           m2:  r(m2)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          m1 |         56   23.67985     2.36   0.018     9.588355    102.4116
          m2 |       16.5   4.808581     3.43   0.001     7.075355    25.92465
------------------------------------------------------------------------------

/*  boot2 by itself: gives nearly identical results */
boot2
return list
scalars:
                 r(m2) =  16
                 r(m1) =  54

/* but not with bootstrap */
bootstrap  m1 = r(m1) m2 =r(m2), ///
strata(female) reps(50): boot2

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    50
insufficient observations to compute bootstrap standard errors
no results will be saved
r(2000);

Last edited by Steve Samuels; 08 Apr 2016, 10:46.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Tags: None

Carole J. Wilson

Join Date: Jan 2015
Posts: 932

08 Apr 2016, 11:52

Very strange.

Code:

set tracedepth 4
set trace on
bootstrap  m1 = r(m1) m2 =r(m2), strata(female) reps(50)  : boot2
set trace off

When bootstrap gets to boot2, it never gets past stsum if !female

Code:

  
------------------------------------------------------------------------ begin boot2 ---
      - stsum if female
      - return scalar m1 = r(p50)
      - stsum if !female
--------------------------------------------------------------------------- end boot2 ---

Running the bootstrap command with the trace option seems to indicate that there is some problem in the `touse' statements. I don't know why that would be the case (I don't know enough about either bootstrap or stsum to hazard a guess).

Code:

bootstrap  m1 = r(m1) m2 =r(m2), strata(female) reps(50)  trace : boot2

....

---------------------------------------------------------------------- begin st_smpl ---
          - version 6, missing
          - if _caller()>=6 {
          - args touse if in by adj
          - mark `touse' `if' `in' `_dta[st_w]'
          = mark __00000A if !female  
          - markout `touse' `adj'
          = markout __00000A
          - markout `touse' `by', strok
          = markout __00000A , strok
          - qui replace `touse' = 0 if _st==0
          = qui replace __00000A = 0 if _st==0
          - qui count if `touse'
          = qui count if __00000A
          - if r(N) == 0 {
          - di in red `"no observations"'
no observations
          - exit 2000
            }
            exit
            }

Stata/MP 14.1 (64-bit x86-64)
Revision 19 May 2016
Win 8.1

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30085
#3

08 Apr 2016, 12:14

I played with this for a little while. There is something bizarre going on between -bootstrap- and program boot2.

If you insert a -tab female- command at the start of program boot2 and run the bootstrap with the -noisily- option, then except for the original-sample run at the beginning, you see that every bootstrap sample is all female! But the plot thickens: if you change the order of the commands in program boot2 so that it does -stsum- on the non-females first, you see that every bootstrap sample is now all male! My mind complete boggles as to how this might happen.

I think this is one for tech support.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

08 Apr 2016, 13:27

Thank you, Carole, and thank you, Clyde. I just sent it in to tech support.

Last edited by Steve Samuels; 08 Apr 2016, 13:33.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

12 Apr 2016, 18:17

Isabelle Canette of StataCorp supplied the solution:

You can solve this problem by using -nodrop-. You need to use -nodrop- every
time you try to use -bootstrap- with an estimation command, and estimations are
using on different subsamples (e.g., if you use -regress- instead of-summarize-).
-stsum- behaves like an eclass command in this sense because it relies on -stcox-
for the estimation.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30085
#6

12 Apr 2016, 19:09

Thanks for sharing that. I never knew about that option.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

13 Apr 2016, 17:14

Me neither.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

bootstrapping medians in 2 groups fails with stsum, but not with summarize

Comment

Comment

Comment

Comment

Comment

Comment