Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating optimism-corrected estimates of performance after bootstrapping

    Hello everyone,

    I have seen a few posts from the past that had covered similar questions before, but as I unfortunately wasn’t able to solve my problem with the help of those threads I thought I’d start a new topic. I have developed a prediction model (Cox) and would now like to obtain the optimism-corrected measures of performance (p), which in my analysis would be Harrell’s C (C-index), the Brier score as well as the calibration slope. The formula which I’m using for that follows the "regular bootstrap" as referred to by Steyerberg et al. (2001): pcorrected = papparent - poptimism

    Here is an outline of the steps:

    1. Train a model on the original dataset and record the value of a performance metric of interest.
    2. Generate a bootstrap sample.
    3. Develop a model using the bootstrap sample (applying the same predictors) and record the corresponding performance metric for the bootstrap-sample-derived model.
    4. Apply the bootstrap model to the original dataset and obtain the performance metric.
    5. Estimate optimism by taking the mean of the differences between the values calculated in step 3 (the apparent performance of the bootstrap-sample-derived model) and step 4 (the bootstrap-sample-derived model's performance when tested on the original sample).
    6. Calculate the optimism-corrected value of the performance metric as the difference between the values calculated in step 1 (the naive value) and step 5 (the estimated optimism).

    The code that I have so far (for the C-index) goes as follows:

    Code:
    capture program drop optimism
    program define optimism, rclass
         preserve
         bsample
         stcox i.risk ib4.cd4baseline_group i.vlbaseline_group age i.SEX, nohr
         estat concordance
         return scalar c = r(C)
    end
    
    stcox i.risk ib4.cd4baseline_group i.vlbaseline_group age i.SEX, nohr
    estat concordance
    local base_harrell = r(C)
    tempfile sim_results
    simulate C = r(c), reps(200) seed(12345) saving(`sim_results'): optimism
    
    use `sim_results', clear
    gen diff = C - `base_harrell'
    summ diff

    I believe that what this code did so far is to calculate the difference between the bootstrap C-index and the original one. Here, I’m not quite sure how to run the bootstrap model in the original dataset and obtain the C-index. I assume the coefficients from each bootstrap sample would need to be saved (matrix b = e(b) ? ) and then applied to the original sample, but unfortunately I can’t figure out how to change the code accordingly so that all the steps as indicated above are carried out properly.

    If anyone has some advice or help to share, that would be very much appreciated - many thanks in advance!



    Reference:
    Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001 Aug;54(8):774-81. doi: 10.1016/s0895-4356(01)00341-9.
    Last edited by Annemarie Pantke; 31 May 2022, 05:09.
Working...
X