Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bootstrap with replacement - roctab

    Hi,

    I am running roctab to find the ROC area. We have repeated measures - some subjects have two observations; some subjects have one observation. I am using bootstrap to adjust for the repeated measures. I would like to sample with replacement such that either all or none of a given subject's data are included in the bootstrap sample. This is the code I have been using:

    PHP Code:
    bootstrap r(area), reps(1000cluster(studyid) : roctab mems_ge90_30d inh_result_avg
    estat bootstrap
    all 
    After running the bootstrap command, I get the following message:

    PHP Code:
    Warning:  Because roctab is not an estimation command or does not set e(sample),
              
    bootstrap has no way to determine which observations are used in calculating
              the statistics 
    and so assumes that all observations are used.  This means
              that no observations will be excluded from the resampling because of missing
              values 
    or other reasons.

              If 
    the assumption is not truepress Break, save the data, and drop the
              observations that are to be excluded
    .  Be sure that the dataset in memory
              contains only the relevant data

    What does that mean? We don't have any missing data for these variables. But, it also seems that it means the resampling is independent of the cluster variable (is that true - does that count as "other reasons"?). Is there a way to resample such that the observations include either all/none of a subject's data?

    Thanks,
    Robin

  • #2
    What does that mean?
    Stata estimation commands (regression models, mostly) leave behind a marker of which observations from the data set were included in the analysis, and -bootstrap- does not bother with those observations that were omitted when it resamples. Non-estimation commands like -roctab- don't mark their estimation sample, so -bootstrap- just assumes the whole data set was used.

    We don't have any missing data for these variables.
    So you have nothing to worry about. The entire data set was used in your -roctab- calculation, so -bootstrap- will do its resampling from the entire data set.

    But, it also seems that it means the resampling is independent of the cluster variable (is that true - does that count as "other reasons"?).
    No, that's not one of the other reasons. An other reason would be something like -roctab- found an invalid value in the data (e.g. a third value in the treatment group variable.) But in that case, -roctab- would just stop with an error message and give no output anyway, so you'd be aware of that.

    Is there a way to resample such that the observations include either all/none of a subject's data?
    Yes, and it is almost exactly what you coded. You just need to add the -idcluster()- option specifying the name of a new variable that -bootstrap- can use to substitute for the original studyid when working with the resampled data.
    Code:
    bootstrap r(area), reps(1000) cluster(studyid) idcluster(some_new_variable_name) : roctab mems_ge90_30d inh_result_avg
    (Actually, I think that -roctab- doesn't attempt to use the id variable anyway, so the omission of -idcluster()- is probably not a problem. But it can't hurt to add it. If it's not needed, it'll just be ignored.

    Comment


    • #3
      Great, thanks so much for these explanations, Clyde!

      Comment

      Working...
      X