Dear statalist users,

I used the following code in internal validation of our model, and get the new file of area (bootstrap AUC), diff (bootstrap AUC-base AUC (only one base AUC?)) and optimism (final AUC, I suppose?) of 200 bootstrap samples. The following is my code according to your suggestion in website:

capture program drop optimism

program define optimism, rclass

preserve

bsample

logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma

lroc, nograph

return scalar area_bootstrap = r(area)

end

logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma

lroc, nograph

local base_ROC = r(area)

tempfile sim_results

simulate area = r(area_bootstrap), reps(200) seed(12345) saving(`sim_results'): optimism use 'sim_results', clear

sum area

gen diff = area - 0.7410

gen optimism = 0.7410 - diff

sum area

sum diff

sum optimism

_pctile optimism, p(2.5 50 97.5)

return list

According to TRIPOD explanation and elaboration, the bootstrap validation should include 6 steps:

1. Develop the prediction model in the orignial data and determine the apparent AUC.

2. Generate a bootstrap sample.

3. Develop a model using the bootstrap sample (applying all the same modeling and predictor selection methods), determining the apparent performace of the model on the bootstrhap sample and the test performance of the bootstrap model in the original sample. (My question is which is the codes of testing performance of the bootstrap model in the original sample?)

4. Calculate the optimism as the difference between the bootstrap performance and test performance (Is the only base AUC the test performance?).

5. Repeat steps 2 through 4 200 times.

6. Average the estmates of optmism in step 5, and substract the value from the apparent performance obtained in step 1 to obtain the optimism-corrected estimate of performance.

The main question is where is the code for testing performance (the performance of bootstrapmodel in the original sample)? Should we use the apparent performance obtained in step 1, instead of the testing performance?

Another question is what command is for convert the log to predicted probbability for cox regression model? I know the command for logistic regression model is 'invlogit'.

Many thanks!

I used the following code in internal validation of our model, and get the new file of area (bootstrap AUC), diff (bootstrap AUC-base AUC (only one base AUC?)) and optimism (final AUC, I suppose?) of 200 bootstrap samples. The following is my code according to your suggestion in website:

capture program drop optimism

program define optimism, rclass

preserve

bsample

logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma

lroc, nograph

return scalar area_bootstrap = r(area)

end

logit AO agec i.sex i.jobm i.incomef i.snec bmi i.lungsymp i.mrcyn i.diaasthma

lroc, nograph

local base_ROC = r(area)

tempfile sim_results

simulate area = r(area_bootstrap), reps(200) seed(12345) saving(`sim_results'): optimism use 'sim_results', clear

sum area

gen diff = area - 0.7410

gen optimism = 0.7410 - diff

sum area

sum diff

sum optimism

_pctile optimism, p(2.5 50 97.5)

return list

According to TRIPOD explanation and elaboration, the bootstrap validation should include 6 steps:

1. Develop the prediction model in the orignial data and determine the apparent AUC.

2. Generate a bootstrap sample.

3. Develop a model using the bootstrap sample (applying all the same modeling and predictor selection methods), determining the apparent performace of the model on the bootstrhap sample and the test performance of the bootstrap model in the original sample. (My question is which is the codes of testing performance of the bootstrap model in the original sample?)

4. Calculate the optimism as the difference between the bootstrap performance and test performance (Is the only base AUC the test performance?).

5. Repeat steps 2 through 4 200 times.

6. Average the estmates of optmism in step 5, and substract the value from the apparent performance obtained in step 1 to obtain the optimism-corrected estimate of performance.

The main question is where is the code for testing performance (the performance of bootstrapmodel in the original sample)? Should we use the apparent performance obtained in step 1, instead of the testing performance?

Another question is what command is for convert the log to predicted probbability for cox regression model? I know the command for logistic regression model is 'invlogit'.

Many thanks!

## Comment