Hello All,
I require some assistance with a reviewer comment to calculate the optimism adjusted AUC.
Many may be familiar with this concept (I was not) but in brief the description of the methodology involves bootstrapping and calculating the AUC in the original and bootstrapped datasets, calculating the difference, and the average of that difference across all datasets is the optimism. A likely more complete and accurate explanation can be found below:
from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108045/ for anyone who wants the original source.
In brief, I can easily calculate the AUC and the bootstrapping, but I fall short in figuring out how I can save the replicated datasets and average out the AUC across all of them.
The data is there
Calculating the AUC:
The bootstrapping is I believe easily accomplished by the rocreg command
But at this point, I have reached the limits of my knowledge. I do not know how to save each of the bootstrapped datasets and apply the prediction model to derive an average "optimism." I had thought the user written command -cvauroc- would get me close to the answer
but it seems to divide up the dataset rather than replicate a dataset of the same size so it does not quite seem to provide an answer to the problem.
The reviewer feedback made it seem as if this was a common technique, and perhaps it is, but I cannot find a straightforward way to accomplish this and I confess to not having heard of it before I started reading into it.
Any assistance with this problem would be greatly appreciated.
Sincerely,
Christopher
I require some assistance with a reviewer comment to calculate the optimism adjusted AUC.
Many may be familiar with this concept (I was not) but in brief the description of the methodology involves bootstrapping and calculating the AUC in the original and bootstrapped datasets, calculating the difference, and the average of that difference across all datasets is the optimism. A likely more complete and accurate explanation can be found below:
Bootstrapping was performed as described by Harrell et al. The small data set was repeatedly resampled to produce b replicated data sets, each the same size as the original. We used b = 200. The predictive model was fitted to each of the b replicated data sets in turn. Each fitted model was then applied both to the resampled data set from which it was generated and to the original data set; the C statistic was calculated for both, and the difference between these 2 statistics was calculated. The b differences were then averaged to give an estimate of the optimism. The optimism-corrected estimate of the C statistic was then calculated as the naïve C statistic minus the estimated optimism.]
In brief, I can easily calculate the AUC and the bootstrapping, but I fall short in figuring out how I can save the replicated datasets and average out the AUC across all of them.
The data is there
Code:
* Example generated by -dataex-. For more info, type help dataex clear input int id byte PET double PAPPA2ngmL byte age double bmi_pre_preg byte race_bin double clinicMAP 192 1 303.36 35 31.47391933 0 110.33333333333333 116 1 35.62 45 27.76709812 1 101 17 0 35.22 40 27.00513097 1 99.33333333333333 167 1 606.16 34 28.84153181 1 97.66666666666667 176 0 70.03999999999999 43 29.99671269 1 96.33333333333333 74 0 76.04 25 21.58226077 0 103.66666666666666 174 1 116.44 30 58.51490993 0 102.16666666666666 7 1 297.28 41 35.15625 1 95.33333333333333 132 1 277.36 30 26.57103476 0 102.91666666666667 58 1 293.76 42 24.40729546 1 92.8888889 236 1 336.48 48 23.0583271 0 99 2 0 19.96 39 54.55568942 0 98 217 0 157.1 36 19.65983459 0 99 1 1 229.36 29 25.15589569 0 98.75 140 1 56.31999999999999 36 22.86659805 1 91.16666666666667 125 0 79.58 42 28.28282828 0 97.33333333333333 9 1 449.52 39 33.25488986 0 97 89 1 117.42 35 27.21730295 1 89.33333333333333 27 0 37.59999999999999 39 30.50508507 1 88.83333333333333 233 0 68.25999999999999 38 23.87511478 1 88.66666666666667 187 0 35.92 41 23.26333168 0 96 69 0 71.28 44 29.58981476 0 95.66666666666667 59 0 34.7 39 26.7755102 0 95.55555554666665 160 0 222.56 31 22.81949008 0 95.66666666666666 38 0 113.52 31 30.83653053 0 95.16666666666667 228 0 80.5 40 31.08109659 1 87.66666666666667 222 0 67.46000000000001 28 35.11682865 0 94 141 0 74.96000000000001 32 24.27618286 1 86.66666666666666 154 0 52.46000000000001 42 39.05273621 0 93.44444445333333 214 0 24.56 42 24.1671624 1 86 45 0 47.92 39 24.79963244 0 93.33333333333333 79 0 47.38 38 38.70975484 0 92.66666666666667 56 0 52.56 33 40.2381133 0 92 19 0 40.06 40 27.88518739 0 92.33333333333333 61 0 47.92 31 20.80087514 0 92.33333333333333 177 0 28.5 40 24.91349481 0 92.16666666666667 210 0 92.34 41 25.2640542 1 84.66666666666667 29 0 55.46 37 23.63281255 1 84.66666666666667 181 0 227.52 29 27.43507725 0 91.66666666666667 199 0 83.84 26 32.69537789 0 91.33333333333333 34 0 148.54 33 25.99591237 0 90.66666666666667 235 0 113.2 32 23.73866213 0 90.66666666666667 54 0 .7 38 22.96176738 0 90.58333333333333 15 0 52.68 34 32.10720958 0 90 203 0 83.22 32 21.11268513 0 90.33333333333333 101 0 83.88 39 26.61666922 0 90 91 0 230.8 42 24.7480492 0 90 84 0 89.58 35 25.23692448 0 90 183 0 80.38 44 30.24199597 0 89.66666666666667 130 0 81.22 38 27.54469581 1 82.16666666666667 end label values race_bin race_bin label def race_bin 0 "White or other", modify label def race_bin 1 "Black", modify
Code:
qui logit PET PAPPA2ngmL age bmi_pre_preg race_bin clinicMAP predict pr roctab pr
Code:
rocreg PET PAPPA2ngmL, ctrlcov(age bmi_pre_preg race_bin clinicMAP) ctrlmodel(linear) nodots
Code:
cvauroc PET PAPPA2ngmL age race_bin clinicMAP, k(8)
The reviewer feedback made it seem as if this was a common technique, and perhaps it is, but I cannot find a straightforward way to accomplish this and I confess to not having heard of it before I started reading into it.
Any assistance with this problem would be greatly appreciated.
Sincerely,
Christopher
Comment