Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating optimism adjusted AUC

    Hello All,

    I require some assistance with a reviewer comment to calculate the optimism adjusted AUC.

    Many may be familiar with this concept (I was not) but in brief the description of the methodology involves bootstrapping and calculating the AUC in the original and bootstrapped datasets, calculating the difference, and the average of that difference across all datasets is the optimism. A likely more complete and accurate explanation can be found below:

    Bootstrapping was performed as described by Harrell et al. The small data set was repeatedly resampled to produce b replicated data sets, each the same size as the original. We used b = 200. The predictive model was fitted to each of the b replicated data sets in turn. Each fitted model was then applied both to the resampled data set from which it was generated and to the original data set; the C statistic was calculated for both, and the difference between these 2 statistics was calculated. The b differences were then averaged to give an estimate of the optimism. The optimism-corrected estimate of the C statistic was then calculated as the naïve C statistic minus the estimated optimism.]
    from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4108045/ for anyone who wants the original source.

    In brief, I can easily calculate the AUC and the bootstrapping, but I fall short in figuring out how I can save the replicated datasets and average out the AUC across all of them.

    The data is there

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id byte PET double PAPPA2ngmL byte age double bmi_pre_preg byte race_bin double clinicMAP
    192 1            303.36 35 31.47391933 0 110.33333333333333
    116 1             35.62 45 27.76709812 1                101
     17 0             35.22 40 27.00513097 1  99.33333333333333
    167 1            606.16 34 28.84153181 1  97.66666666666667
    176 0 70.03999999999999 43 29.99671269 1  96.33333333333333
     74 0             76.04 25 21.58226077 0 103.66666666666666
    174 1            116.44 30 58.51490993 0 102.16666666666666
      7 1            297.28 41    35.15625 1  95.33333333333333
    132 1            277.36 30 26.57103476 0 102.91666666666667
     58 1            293.76 42 24.40729546 1         92.8888889
    236 1            336.48 48  23.0583271 0                 99
      2 0             19.96 39 54.55568942 0                 98
    217 0             157.1 36 19.65983459 0                 99
      1 1            229.36 29 25.15589569 0              98.75
    140 1 56.31999999999999 36 22.86659805 1  91.16666666666667
    125 0             79.58 42 28.28282828 0  97.33333333333333
      9 1            449.52 39 33.25488986 0                 97
     89 1            117.42 35 27.21730295 1  89.33333333333333
     27 0 37.59999999999999 39 30.50508507 1  88.83333333333333
    233 0 68.25999999999999 38 23.87511478 1  88.66666666666667
    187 0             35.92 41 23.26333168 0                 96
     69 0             71.28 44 29.58981476 0  95.66666666666667
     59 0              34.7 39  26.7755102 0  95.55555554666665
    160 0            222.56 31 22.81949008 0  95.66666666666666
     38 0            113.52 31 30.83653053 0  95.16666666666667
    228 0              80.5 40 31.08109659 1  87.66666666666667
    222 0 67.46000000000001 28 35.11682865 0                 94
    141 0 74.96000000000001 32 24.27618286 1  86.66666666666666
    154 0 52.46000000000001 42 39.05273621 0  93.44444445333333
    214 0             24.56 42  24.1671624 1                 86
     45 0             47.92 39 24.79963244 0  93.33333333333333
     79 0             47.38 38 38.70975484 0  92.66666666666667
     56 0             52.56 33  40.2381133 0                 92
     19 0             40.06 40 27.88518739 0  92.33333333333333
     61 0             47.92 31 20.80087514 0  92.33333333333333
    177 0              28.5 40 24.91349481 0  92.16666666666667
    210 0             92.34 41  25.2640542 1  84.66666666666667
     29 0             55.46 37 23.63281255 1  84.66666666666667
    181 0            227.52 29 27.43507725 0  91.66666666666667
    199 0             83.84 26 32.69537789 0  91.33333333333333
     34 0            148.54 33 25.99591237 0  90.66666666666667
    235 0             113.2 32 23.73866213 0  90.66666666666667
     54 0                .7 38 22.96176738 0  90.58333333333333
     15 0             52.68 34 32.10720958 0                 90
    203 0             83.22 32 21.11268513 0  90.33333333333333
    101 0             83.88 39 26.61666922 0                 90
     91 0             230.8 42  24.7480492 0                 90
     84 0             89.58 35 25.23692448 0                 90
    183 0             80.38 44 30.24199597 0  89.66666666666667
    130 0             81.22 38 27.54469581 1  82.16666666666667
    end
    label values race_bin race_bin
    label def race_bin 0 "White or other", modify
    label def race_bin 1 "Black", modify
    Calculating the AUC:

    Code:
    qui logit PET PAPPA2ngmL age bmi_pre_preg race_bin clinicMAP
    predict pr 
    roctab pr
    The bootstrapping is I believe easily accomplished by the rocreg command
    Code:
    rocreg PET PAPPA2ngmL, ctrlcov(age bmi_pre_preg race_bin clinicMAP) ctrlmodel(linear) nodots
    But at this point, I have reached the limits of my knowledge. I do not know how to save each of the bootstrapped datasets and apply the prediction model to derive an average "optimism." I had thought the user written command -cvauroc- would get me close to the answer

    Code:
    cvauroc PET PAPPA2ngmL age race_bin clinicMAP, k(8)
    but it seems to divide up the dataset rather than replicate a dataset of the same size so it does not quite seem to provide an answer to the problem.

    The reviewer feedback made it seem as if this was a common technique, and perhaps it is, but I cannot find a straightforward way to accomplish this and I confess to not having heard of it before I started reading into it.

    Any assistance with this problem would be greatly appreciated.

    Sincerely,

    Christopher

  • #2
    Maybe try something along the following lines.

    .ÿ
    .ÿversionÿ17.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿquietlyÿinputÿintÿidÿbyteÿPETÿdoubleÿPAPPA2ngmLÿbyteÿageÿ///
    >ÿÿÿÿÿÿÿÿÿdoubleÿbmi_pre_pregÿbyteÿrace_binÿdoubleÿclinicMAP

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿ//ÿPutÿthisÿnextÿlineÿwayÿupÿtop,ÿjustÿafterÿ-clearÿ*-
    .ÿsetÿseedÿ152103232

    .ÿ
    .ÿframeÿcreateÿAUCsÿdouble(bsaÿora)ÿintÿerr

    .ÿ
    .ÿtempnameÿbsa

    .ÿ
    .ÿforvaluesÿiÿ=ÿ1/200ÿ{
    ÿÿ2.ÿ
    .ÿÿÿÿÿÿÿÿÿpreserve
    ÿÿ3.ÿÿÿÿÿÿÿÿÿbsample
    ÿÿ4.ÿ
    .ÿÿÿÿÿÿÿÿÿcaptureÿlogitÿPETÿc.(PAPPA2ngmLÿageÿbmi_pre_pregÿclinicMAP)ÿi.race_bin
    ÿÿ5.ÿ
    .ÿÿÿÿÿÿÿÿÿifÿ!_rcÿ{
    ÿÿ6.ÿ
    .ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿquietlyÿlrocÿ,ÿnograph
    ÿÿ7.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`bsa'ÿ=ÿr(area)
    ÿÿ8.ÿ
    .ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrestore
    ÿÿ9.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿpredictÿdoubleÿxb,ÿxb
    ÿ10.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿquietlyÿroctabÿPETÿxb
    ÿ11.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿframeÿpostÿAUCsÿ(`bsa')ÿ(r(area))ÿ(0)
    ÿ12.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿdropÿxb
    ÿ13.ÿÿÿÿÿÿÿÿÿ}
    ÿ14.ÿÿÿÿÿÿÿÿÿelseÿ{
    ÿ15.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿframeÿpostÿAUCsÿ(.)ÿ(.)ÿ(_rc)
    ÿ16.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrestore
    ÿ17.ÿÿÿÿÿÿÿÿÿ}
    ÿ18.ÿ}
    (16ÿmissingÿvaluesÿgenerated)

    .ÿ
    .ÿcwfÿAUCs

    .ÿtabulateÿerr,ÿmissingÿ

    ÿÿÿÿÿÿÿÿerrÿ|ÿÿÿÿÿÿFreq.ÿÿÿÿÿPercentÿÿÿÿÿÿÿÿCum.
    ------------+-----------------------------------
    ÿÿÿÿÿÿÿÿÿÿ0ÿ|ÿÿÿÿÿÿÿÿ196ÿÿÿÿÿÿÿ98.00ÿÿÿÿÿÿÿ98.00
    ÿÿÿÿÿÿÿÿ430ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0.50ÿÿÿÿÿÿÿ98.50
    ÿÿÿÿÿÿÿ2000ÿ|ÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿ1.50ÿÿÿÿÿÿ100.00
    ------------+-----------------------------------
    ÿÿÿÿÿÿTotalÿ|ÿÿÿÿÿÿÿÿ200ÿÿÿÿÿÿ100.00

    .ÿ
    .ÿquietlyÿgenerateÿdoubleÿdeltaÿ=ÿbsaÿ-ÿora

    .ÿsummarizeÿdelta,ÿmeanonly

    .ÿtempnameÿoptimism

    .ÿscalarÿdefineÿ`optimism'ÿ=ÿr(mean)

    .ÿ
    .ÿcwfÿdefault

    .ÿquietlyÿlogitÿPETÿc.(PAPPA2ngmLÿageÿbmi_pre_pregÿclinicMAP)ÿi.race_bin

    .ÿlrocÿ,ÿnograph

    LogisticÿmodelÿforÿPET

    Numberÿofÿobservationsÿ=ÿÿÿÿÿÿÿ50
    AreaÿunderÿROCÿcurveÿÿÿ=ÿÿÿ0.9846

    .ÿ
    .ÿdisplayÿinÿsmclÿasÿtextÿ"Optimism-adjustedÿAUCÿ=ÿ"ÿasÿresultÿ%04.2fÿr(area)ÿ-ÿ`optimism'
    Optimism-adjustedÿAUCÿ=ÿ0.93

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    You can post the N for the -roctab- step to the cumulator frame in order to see which bootstrap sample gave rise to those sixteen missing values. (You have the pseudorandom number generator seed and so you can readily recreate the offending datasets for both that case and the complete-separation / failure-to-converge cases.) I've attached the DO-file if you want to take it further or just look for bugs etc.

    From my experience, your optimism-adjusted ROC AUC still seems pretty optimistic.
    Attached Files

    Comment


    • #3
      Thank you Joseph. It works like a dream. Apologies for the delay in thanking you. I was busy with my clinical duties and didn't have time to get back to this. But thank you again. I can see I would not have been able to muddle through this on my own, as I am not proficient enough in writing my own code.

      As for you observation, that the AUC is very very optimistic.... well it is. Very much so. I think I can explain that though with a few clinical observations. Basically this was a high risk population and the clinical variables predict the outcome fairly well. So basically we have a good way to predict the outcome with clinical characteristics and when we add the various biomarkers in study it gets a little bit better. The biomarkers on their own perform in a much more modest manner. So is the addition of the biomarkers worthwhile given that clinical variables perform so well... I still think there is value but we shall see if any journal editors agree with us.

      Comment

      Working...
      X