Calculating optimism-corrected estimates of performance after bootstrapping

Annemarie Pantke

Join Date: May 2022

Posts: 1
#1

Calculating optimism-corrected estimates of performance after bootstrapping

31 May 2022, 04:59

Hello everyone,

I have seen a few posts from the past that had covered similar questions before, but as I unfortunately wasn’t able to solve my problem with the help of those threads I thought I’d start a new topic. I have developed a prediction model (Cox) and would now like to obtain the optimism-corrected measures of performance (p), which in my analysis would be Harrell’s C (C-index), the Brier score as well as the calibration slope. The formula which I’m using for that follows the "regular bootstrap" as referred to by Steyerberg et al. (2001): p_corrected= p_apparent- p_optimism

Here is an outline of the steps:

Train a model on the original dataset and record the value of a performance metric of interest.

Generate a bootstrap sample.

Develop a model using the bootstrap sample (applying the same predictors) and record the corresponding performance metric for the bootstrap-sample-derived model.

Apply the bootstrap model to the original dataset and obtain the performance metric.

Estimate optimism by taking the mean of the differences between the values calculated in step 3 (the apparent performance of the bootstrap-sample-derived model) and step 4 (the bootstrap-sample-derived model's performance when tested on the original sample).

Calculate the optimism-corrected value of the performance metric as the difference between the values calculated in step 1 (the naive value) and step 5 (the estimated optimism).

The code that I have so far (for the C-index) goes as follows:

Code:

capture program drop optimism program define optimism, rclass preserve bsample stcox i.risk ib4.cd4baseline_group i.vlbaseline_group age i.SEX, nohr estat concordance return scalar c = r(C) end stcox i.risk ib4.cd4baseline_group i.vlbaseline_group age i.SEX, nohr estat concordance local base_harrell = r(C) tempfile sim_results simulate C = r(c), reps(200) seed(12345) saving(`sim_results'): optimism use `sim_results', clear gen diff = C - `base_harrell' summ diff

I believe that what this code did so far is to calculate the difference between the bootstrap C-index and the original one. Here, I’m not quite sure how to run the bootstrap model in the original dataset and obtain the C-index. I assume the coefficients from each bootstrap sample would need to be saved (matrix b = e(b) ? ) and then applied to the original sample, but unfortunately I can’t figure out how to change the code accordingly so that all the steps as indicated above are carried out properly.

If anyone has some advice or help to share, that would be very much appreciated - many thanks in advance!

Reference:
Steyerberg EW, Harrell FE Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: Efficiency of some procedures for logistic regression analysis. J Clin Epidemiol. 2001 Aug;54(8):774-81. doi: 10.1016/s0895-4356(01)00341-9.

Last edited by Annemarie Pantke; 31 May 2022, 05:09.
Tags: None

Announcement

Calculating optimism-corrected estimates of performance after bootstrapping