Dear Stata users,
I am trying to perform internal validation of a series of cox-regression models, as stated in the TRIPOD statement:
- Step 1: develop a prediction model using the entire original sample (size n) and determine apparent performance
- Step 2: generate a bootstrap sample by sampling n individuals with replacement from the original sample
- Step 3: develop a model using the bootstrap sample as in step 1
3a: determine apparent performance of this model on the bootstrap sample
3b: determine performance of the bootstrap model in the original sample
- Step 4: calculate optimism as the difference between 3a and 3b
- Step 5: repeat steps 2 to 4 at least 100 times
- Step 6: average the estimates of optimism in step 5 and substract from apparent performance obtained in step 1
I found some code on Github and adapted it myself.
I run this program over a series of outcomes and write the results to a postfile.
The code is working, however the optimism that it calculates is almost zero, which made me wonder whether my calculations are correct.
I am unsure about calculating the predictions and Gonen Heller's K after the restore command. Does this indeed calculate the GH K of the bootstrap model in the original sample as suggested in step 3b?
Best wishes,
Marianne Heins
I am trying to perform internal validation of a series of cox-regression models, as stated in the TRIPOD statement:
- Step 1: develop a prediction model using the entire original sample (size n) and determine apparent performance
- Step 2: generate a bootstrap sample by sampling n individuals with replacement from the original sample
- Step 3: develop a model using the bootstrap sample as in step 1
3a: determine apparent performance of this model on the bootstrap sample
3b: determine performance of the bootstrap model in the original sample
- Step 4: calculate optimism as the difference between 3a and 3b
- Step 5: repeat steps 2 to 4 at least 100 times
- Step 6: average the estimates of optimism in step 5 and substract from apparent performance obtained in step 1
I found some code on Github and adapted it myself.
Code:
capture program drop bsoptim program define bsoptim, rclass // initial regression on full dataset stcox case age radio chemo horm // variables for optimism, samplesize and the commandline for the regression local optim = 0 local sampsize = e(N) local call = e(cmdline) // apparent gonen heller's K from full model estat con, gheller local gh_app=r(K) drop if e(sample)==0 // loop over bootstrap samples local reps = 100 local x=1 while `x' <= `reps' { preserve bsample `sampsize' `call' estat con, gheller local gh_boot=r(K) restore predict pred estat con, gheller local gh_orig=r(K) local diff=`gh_boot' - `gh_orig' local optim = `optim' + `diff' drop pred local x = `x' + 1 } // calculate final optimism and corrected gonen heller's K local optim = `optim'/`reps' local gh_corr = `gh_app' - `optim' scalar gh_app = `gh_app' scalar optim = `optim' scalar gh_corr = `gh_corr' end
The code is working, however the optimism that it calculates is almost zero, which made me wonder whether my calculations are correct.
I am unsure about calculating the predictions and Gonen Heller's K after the restore command. Does this indeed calculate the GH K of the bootstrap model in the original sample as suggested in step 3b?
Best wishes,
Marianne Heins