Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • internal validation of a cox regression model according to the TRIPOD statement

    Dear Stata users,

    I am trying to perform internal validation of a series of cox-regression models, as stated in the TRIPOD statement:

    - Step 1: develop a prediction model using the entire original sample (size n) and determine apparent performance
    - Step 2: generate a bootstrap sample by sampling n individuals with replacement from the original sample
    - Step 3: develop a model using the bootstrap sample as in step 1
    3a: determine apparent performance of this model on the bootstrap sample
    3b: determine performance of the bootstrap model in the original sample
    - Step 4: calculate optimism as the difference between 3a and 3b
    - Step 5: repeat steps 2 to 4 at least 100 times
    - Step 6: average the estimates of optimism in step 5 and substract from apparent performance obtained in step 1

    I found some code on Github and adapted it myself.

    Code:
    capture program drop bsoptim
    
    program define bsoptim, rclass
    
    // initial regression on full dataset
    stcox case age radio chemo horm 
    
    // variables for optimism, samplesize and the commandline for the regression
    local optim = 0
    local sampsize = e(N)
    local call = e(cmdline)
    
    // apparent gonen heller's K from full model
    estat con, gheller
    local gh_app=r(K)
    
    drop if e(sample)==0
    
    // loop over bootstrap samples
    local reps = 100
    local x=1
    while `x' <= `reps' {
      preserve
      bsample `sampsize'
      `call' 
      estat con, gheller
      local gh_boot=r(K)
      restore
      predict pred
      estat con, gheller
      local gh_orig=r(K)
      local diff=`gh_boot' - `gh_orig'
      local optim = `optim' + `diff'
      drop pred 
      local x = `x' + 1
    }
    
    // calculate final optimism and corrected gonen heller's K 
    local optim = `optim'/`reps'
    local gh_corr = `gh_app' - `optim'
    
    scalar gh_app = `gh_app'
    scalar optim = `optim'
    scalar gh_corr = `gh_corr'
    
    end
    I run this program over a series of outcomes and write the results to a postfile.

    The code is working, however the optimism that it calculates is almost zero, which made me wonder whether my calculations are correct.
    I am unsure about calculating the predictions and Gonen Heller's K after the restore command. Does this indeed calculate the GH K of the bootstrap model in the original sample as suggested in step 3b?

    Best wishes,
    Marianne Heins
Working...
X