Calculating optimism adjusted AUC

Chris Labos

Join Date: Aug 2014
Posts: 39

Calculating optimism adjusted AUC

29 Sep 2022, 19:47

Hello All,

I require some assistance with a reviewer comment to calculate the optimism adjusted AUC.

Many may be familiar with this concept (I was not) but in brief the description of the methodology involves bootstrapping and calculating the AUC in the original and bootstrapped datasets, calculating the difference, and the average of that difference across all datasets is the optimism. A likely more complete and accurate explanation can be found below:

Bootstrapping was performed as described by Harrell et al. The small data set was repeatedly resampled to produce b replicated data sets, each the same size as the original. We used b = 200. The predictive model was fitted to each of the b replicated data sets in turn. Each fitted model was then applied both to the resampled data set from which it was generated and to the original data set; the C statistic was calculated for both, and the difference between these 2 statistics was calculated. The b differences were then averaged to give an estimate of the optimism. The optimism-corrected estimate of the C statistic was then calculated as the naïve C statistic minus the estimated optimism.]

from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108045/ for anyone who wants the original source.

In brief, I can easily calculate the AUC and the bootstrapping, but I fall short in figuring out how I can save the replicated datasets and average out the AUC across all of them.

The data is there

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int id byte PET double PAPPA2ngmL byte age double bmi_pre_preg byte race_bin double clinicMAP
192 1            303.36 35 31.47391933 0 110.33333333333333
116 1             35.62 45 27.76709812 1                101
 17 0             35.22 40 27.00513097 1  99.33333333333333
167 1            606.16 34 28.84153181 1  97.66666666666667
176 0 70.03999999999999 43 29.99671269 1  96.33333333333333
 74 0             76.04 25 21.58226077 0 103.66666666666666
174 1            116.44 30 58.51490993 0 102.16666666666666
  7 1            297.28 41    35.15625 1  95.33333333333333
132 1            277.36 30 26.57103476 0 102.91666666666667
 58 1            293.76 42 24.40729546 1         92.8888889
236 1            336.48 48  23.0583271 0                 99
  2 0             19.96 39 54.55568942 0                 98
217 0             157.1 36 19.65983459 0                 99
  1 1            229.36 29 25.15589569 0              98.75
140 1 56.31999999999999 36 22.86659805 1  91.16666666666667
125 0             79.58 42 28.28282828 0  97.33333333333333
  9 1            449.52 39 33.25488986 0                 97
 89 1            117.42 35 27.21730295 1  89.33333333333333
 27 0 37.59999999999999 39 30.50508507 1  88.83333333333333
233 0 68.25999999999999 38 23.87511478 1  88.66666666666667
187 0             35.92 41 23.26333168 0                 96
 69 0             71.28 44 29.58981476 0  95.66666666666667
 59 0              34.7 39  26.7755102 0  95.55555554666665
160 0            222.56 31 22.81949008 0  95.66666666666666
 38 0            113.52 31 30.83653053 0  95.16666666666667
228 0              80.5 40 31.08109659 1  87.66666666666667
222 0 67.46000000000001 28 35.11682865 0                 94
141 0 74.96000000000001 32 24.27618286 1  86.66666666666666
154 0 52.46000000000001 42 39.05273621 0  93.44444445333333
214 0             24.56 42  24.1671624 1                 86
 45 0             47.92 39 24.79963244 0  93.33333333333333
 79 0             47.38 38 38.70975484 0  92.66666666666667
 56 0             52.56 33  40.2381133 0                 92
 19 0             40.06 40 27.88518739 0  92.33333333333333
 61 0             47.92 31 20.80087514 0  92.33333333333333
177 0              28.5 40 24.91349481 0  92.16666666666667
210 0             92.34 41  25.2640542 1  84.66666666666667
 29 0             55.46 37 23.63281255 1  84.66666666666667
181 0            227.52 29 27.43507725 0  91.66666666666667
199 0             83.84 26 32.69537789 0  91.33333333333333
 34 0            148.54 33 25.99591237 0  90.66666666666667
235 0             113.2 32 23.73866213 0  90.66666666666667
 54 0                .7 38 22.96176738 0  90.58333333333333
 15 0             52.68 34 32.10720958 0                 90
203 0             83.22 32 21.11268513 0  90.33333333333333
101 0             83.88 39 26.61666922 0                 90
 91 0             230.8 42  24.7480492 0                 90
 84 0             89.58 35 25.23692448 0                 90
183 0             80.38 44 30.24199597 0  89.66666666666667
130 0             81.22 38 27.54469581 1  82.16666666666667
end
label values race_bin race_bin
label def race_bin 0 "White or other", modify
label def race_bin 1 "Black", modify

Calculating the AUC:

Code:

qui logit PET PAPPA2ngmL age bmi_pre_preg race_bin clinicMAP
predict pr 
roctab pr

The bootstrapping is I believe easily accomplished by the rocreg command

Code:

rocreg PET PAPPA2ngmL, ctrlcov(age bmi_pre_preg race_bin clinicMAP) ctrlmodel(linear) nodots

But at this point, I have reached the limits of my knowledge. I do not know how to save each of the bootstrapped datasets and apply the prediction model to derive an average "optimism." I had thought the user written command -cvauroc- would get me close to the answer

Code:

cvauroc PET PAPPA2ngmL age race_bin clinicMAP, k(8)

but it seems to divide up the dataset rather than replicate a dataset of the same size so it does not quite seem to provide an answer to the problem.

The reviewer feedback made it seem as if this was a common technique, and perhaps it is, but I cannot find a straightforward way to accomplish this and I confess to not having heard of it before I started reading into it.

Any assistance with this problem would be greatly appreciated.

Sincerely,

Christopher

Tags: area under the curve, AUC, bootstrapping

Joseph Coveney

Join Date: Apr 2014

Posts: 4447
#2

29 Sep 2022, 21:20

Maybe try something along the following lines.

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿquietlyÿinputÿintÿidÿbyteÿPETÿdoubleÿPAPPA2ngmLÿbyteÿageÿ///
>ÿÿÿÿÿÿÿÿÿdoubleÿbmi_pre_pregÿbyteÿrace_binÿdoubleÿclinicMAP

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿ//ÿPutÿthisÿnextÿlineÿwayÿupÿtop,ÿjustÿafterÿ-clearÿ*-
.ÿsetÿseedÿ152103232

.ÿ
.ÿframeÿcreateÿAUCsÿdouble(bsaÿora)ÿintÿerr

.ÿ
.ÿtempnameÿbsa

.ÿ
.ÿforvaluesÿiÿ=ÿ1/200ÿ{
ÿÿ2.ÿ
.ÿÿÿÿÿÿÿÿÿpreserve
ÿÿ3.ÿÿÿÿÿÿÿÿÿbsample
ÿÿ4.ÿ
.ÿÿÿÿÿÿÿÿÿcaptureÿlogitÿPETÿc.(PAPPA2ngmLÿageÿbmi_pre_pregÿclinicMAP)ÿi.race_bin
ÿÿ5.ÿ
.ÿÿÿÿÿÿÿÿÿifÿ!_rcÿ{
ÿÿ6.ÿ
.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿquietlyÿlrocÿ,ÿnograph
ÿÿ7.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`bsa'ÿ=ÿr(area)
ÿÿ8.ÿ
.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrestore
ÿÿ9.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿpredictÿdoubleÿxb,ÿxb
ÿ10.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿquietlyÿroctabÿPETÿxb
ÿ11.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿframeÿpostÿAUCsÿ(`bsa')ÿ(r(area))ÿ(0)
ÿ12.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿdropÿxb
ÿ13.ÿÿÿÿÿÿÿÿÿ}
ÿ14.ÿÿÿÿÿÿÿÿÿelseÿ{
ÿ15.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿframeÿpostÿAUCsÿ(.)ÿ(.)ÿ(_rc)
ÿ16.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrestore
ÿ17.ÿÿÿÿÿÿÿÿÿ}
ÿ18.ÿ}
(16ÿmissingÿvaluesÿgenerated)

.ÿ
.ÿcwfÿAUCs

.ÿtabulateÿerr,ÿmissingÿ

ÿÿÿÿÿÿÿÿerrÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
------------+-----------------------------------
ÿÿÿÿÿÿÿÿÿÿ0ÿ|ÿÿÿÿÿÿÿÿ196ÿÿÿÿÿÿÿ98.00ÿÿÿÿÿÿÿ98.00
ÿÿÿÿÿÿÿÿ430ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0.50ÿÿÿÿÿÿÿ98.50
ÿÿÿÿÿÿÿ2000ÿ|ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿ1.50ÿÿÿÿÿÿ100.00
------------+-----------------------------------
ÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ200ÿÿÿÿÿÿ100.00

.ÿ
.ÿquietlyÿgenerateÿdoubleÿdeltaÿ=ÿbsaÿ-ÿora

.ÿsummarizeÿdelta,ÿmeanonly

.ÿtempnameÿoptimism

.ÿscalarÿdefineÿ`optimism'ÿ=ÿr(mean)

.ÿ
.ÿcwfÿdefault

.ÿquietlyÿlogitÿPETÿc.(PAPPA2ngmLÿageÿbmi_pre_pregÿclinicMAP)ÿi.race_bin

.ÿlrocÿ,ÿnograph

LogisticÿmodelÿforÿPET

Numberÿofÿobservationsÿ=ÿÿÿÿÿÿÿ50
AreaÿunderÿROCÿcurveÿÿÿ=ÿÿÿ0.9846

.ÿ
.ÿdisplayÿinÿsmclÿasÿtextÿ"Optimism-adjustedÿAUCÿ=ÿ"ÿasÿresultÿ%04.2fÿr(area)ÿ-ÿ`optimism'
Optimism-adjustedÿAUCÿ=ÿ0.93

.ÿ
.ÿexit

endÿofÿdo-file

.

You can post the N for the -roctab- step to the cumulator frame in order to see which bootstrap sample gave rise to those sixteen missing values. (You have the pseudorandom number generator seed and so you can readily recreate the offending datasets for both that case and the complete-separation / failure-to-converge cases.) I've attached the DO-file if you want to take it further or just look for bugs etc.

From my experience, your optimism-adjusted ROC AUC still seems pretty optimistic.
Attached Files

do-file for reference.do (4.0 KB, 1 view)
Comment
Chris Labos

Join Date: Aug 2014

Posts: 39
#3

02 Oct 2022, 17:28

Thank you Joseph. It works like a dream. Apologies for the delay in thanking you. I was busy with my clinical duties and didn't have time to get back to this. But thank you again. I can see I would not have been able to muddle through this on my own, as I am not proficient enough in writing my own code.

As for you observation, that the AUC is very very optimistic.... well it is. Very much so. I think I can explain that though with a few clinical observations. Basically this was a high risk population and the clinical variables predict the outcome fairly well. So basically we have a good way to predict the outcome with clinical characteristics and when we add the various biomarkers in study it gets a little bit better. The biomarkers on their own perform in a much more modest manner. So is the addition of the biomarkers worthwhile given that clinical variables perform so well... I still think there is value but we shall see if any journal editors agree with us.
Comment

Announcement

Calculating optimism adjusted AUC

Comment

Comment