Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ROC curve with multiple curves

    Hi,

    I've been making ROC curves in Stata using a gold standard (true_value), and several predictors (pred_1, pred_2, pred_3, AI_pred) - for this, I've been using the following code, which works perfectly:
    Code:
     roccomp true_value pred_1 pred_2 pred_3 AI_pred, graph summary
    However, I now want to make the same graph, but by the predictors' experience (value from 1-3), which I'm having a lot of trouble with.
    I've tried focusing on just one of the predictors, using the by() option:
    Code:
     roccomp true_value pred_1, by(pred_1_exp) graph summary
    But I get the following error:
    insufficient observations
    I've also tried generating new variables like this:
    Code:
    g pred1_exp1 = pred_1 if pred_1_exp==1
    g pred1_exp2 = pred_1 if pred_1_exp==2
    g pred1_exp3 = pred_1 if pred_1_exp==3
    And then using the following code for roccomp:
    Code:
     roccomp true_value pred1_exp1 pred1_exp2 pred1_exp3 AI_pred, graph summary
    But I then get the following error:
    variable pred_1_exp1 does not vary


    However, even if I could get that to work, it is still not exactly what I want, since I would like a graph with a curve for:
    1) pred_1 experience=1
    2) pred_1 experience=2
    3) pred_1 experience=3
    4) pred_2 experience=1
    ...
    9) pred_3 experience=3
    10) AI_pred



    Example of data:

    Code:
    id               true_value  pred_1  pred_2  pred_3  pred_1_exp  pred_2_exp  pred_3_exp  AI_pred
    1                0           0       0       0       1           1           3           0
    2                1           0       1       0       3           1           2           0
    3                1           0       1       0       2           1           2           1
    ...
    150000           0           1       0       0       3           2           1           1


    I can´t really wrap my head around if this can be done using another code, or if I have to rearrange my data, or how (/if) this can be done. I really hope someone is able to help

  • #2
    Originally posted by Sara Hansen View Post
    However, I now want to make the same graph, but by the predictors' experience (value from 1-3), which I'm having a lot of trouble with.
    I've tried focusing on just one of the predictors, using the by() option:
    Code:
     roccomp true_value pred_1, by(pred_1_exp) graph summary
    But I get the following error:
    That exact syntax (verbatim) works perfectly for me. See output below (begin at the "Begin here" comment; the stuff above is just to create a dataset that mimics the structure of yours.)

    You might want to re-check things.

    .ÿ
    .ÿversionÿ17.0

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿ//ÿseedem
    .ÿsetÿseedÿ491458654

    .ÿ
    .ÿtempfileÿdataset

    .ÿquietlyÿsaveÿ`dataset',ÿemptyok

    .ÿ
    .ÿtempnameÿCorr

    .ÿforvaluesÿexperienceÿ=ÿ1/3ÿ{
    ÿÿ2.ÿ
    .ÿÿÿÿÿÿÿÿÿlocalÿcorrÿ=ÿ50ÿ+ÿ`experience'ÿ*ÿ10
    ÿÿ3.ÿÿÿÿÿÿÿÿÿmatrixÿdefineÿ`Corr'ÿ=ÿJ(5,ÿ5,ÿ`corr'ÿ/ÿ100)ÿ+ÿI(5)ÿ*ÿ(100ÿ-ÿ`corr')ÿ/ÿ100
    ÿÿ4.ÿ
    .ÿÿÿÿÿÿÿÿÿdrawnormÿtrue_valueÿpred_1ÿpred_2ÿpred_3ÿAI_pred,ÿdoubleÿ///
    >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcorr(`Corr')ÿn(`=150000/3')ÿclear
    ÿÿ5.ÿ
    .ÿÿÿÿÿÿÿÿÿforeachÿvarÿofÿvarlistÿ_allÿ{
    ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿquietlyÿreplaceÿ`var'ÿ=ÿ`var'ÿ>ÿ0
    ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
    ÿÿ8.ÿ
    .ÿÿÿÿÿÿÿÿÿforeachÿvarÿofÿnewlistÿpred_1_expÿpred_2_expÿpred_3_expÿ{
    ÿÿ9.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgenerateÿbyteÿ`var'ÿ=ÿ`experience'
    ÿ10.ÿÿÿÿÿÿÿÿÿ}
    ÿ11.ÿ
    .ÿÿÿÿÿÿÿÿÿappendÿusingÿ`dataset'
    ÿ12.ÿÿÿÿÿÿÿÿÿquietlyÿsaveÿ`dataset',ÿreplace
    ÿ13.ÿ}
    (obsÿ50,000)
    (obsÿ50,000)
    (obsÿ50,000)

    .ÿ
    .ÿgenerateÿlongÿidÿ=ÿ_n

    .ÿ
    .ÿ*
    .ÿ*ÿBeginÿhere
    .ÿ*
    .ÿlistÿidÿpred_?ÿpred_?_expÿAI_predÿifÿinlist(id,ÿ1,ÿ2,ÿ3,ÿ150000),ÿnoobsÿ///
    >ÿÿÿÿÿÿÿÿÿseparator(3)ÿabbreviate(20)

    ÿÿ+------------------------------------------------------------------------------------+
    ÿÿ|ÿÿÿÿÿidÿÿÿpred_1ÿÿÿpred_2ÿÿÿpred_3ÿÿÿpred_1_expÿÿÿpred_2_expÿÿÿpred_3_expÿÿÿAI_predÿ|
    ÿÿ|------------------------------------------------------------------------------------|
    ÿÿ|ÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿ0ÿ|
    ÿÿ|ÿÿÿÿÿÿ2ÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ|ÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿÿÿÿ3ÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ|------------------------------------------------------------------------------------|
    ÿÿ|ÿ150000ÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿÿÿÿ1ÿ|
    ÿÿ+------------------------------------------------------------------------------------+

    .ÿ
    .ÿroccompÿtrue_valueÿpred_1,ÿby(pred_1_exp)ÿgraphÿsummaryÿ//ÿ<=ÿHere

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿROCÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿAsymptoticÿnormalÿÿ
    pred_1_expÿÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿareaÿÿÿÿÿStd.ÿerr.ÿÿÿÿÿÿ[95%ÿconf.ÿinterval]
    -------------------------------------------------------------------------
    1ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ50000ÿÿÿÿÿ0.7067ÿÿÿÿÿÿÿ0.0020ÿÿÿÿÿÿÿÿ0.70275ÿÿÿÿÿ0.71073
    2ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ50000ÿÿÿÿÿ0.7472ÿÿÿÿÿÿÿ0.0019ÿÿÿÿÿÿÿÿ0.74339ÿÿÿÿÿ0.75101
    3ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ50000ÿÿÿÿÿ0.7952ÿÿÿÿÿÿÿ0.0018ÿÿÿÿÿÿÿÿ0.79171ÿÿÿÿÿ0.79878
    -------------------------------------------------------------------------
    H0:ÿarea(1)ÿ=ÿarea(2)ÿ=ÿarea(3)
    ÿÿÿÿchi2(2)ÿ=ÿÿ1073.04ÿÿÿÿÿÿÿProb>chi2ÿ=ÿÿÿ0.0000

    .ÿ
    .ÿquietlyÿgraphÿexportÿpred_1.png

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .


    You'll want to tweak the graph quit a bit for appearance.

    Click image for larger version

Name:	pred_1.png
Views:	1
Size:	51.9 KB
ID:	1664827

    Comment


    • #3
      Thank you, Joseph Coveney! I'm not sure where my error was, but I see that it works now!
      Do you know how I can get all the preds by pred_exp in one graph? I.e. I would like the true_value and pred_1 by(pred_1_exp) as well as pred_2 by(pred_2_exp) etc.

      When I try to add for example AI_pred to the graph, like this:
      Code:
       roccomp true_value pred_1 AI_pred, by(pred_1_exp)
      I get the following error:
      by () option not allowed with correlated samples
      perhaps you meant to use separate

      Comment


      • #4
        You'd have to compute the area under the receiver operating characteristic curve separately for your AI predictions and then add that plot to your graph. Unfortunately, roccomp is not one of those commands that allows an addplot() option, and so you'll need to sneak the plot in using an undocumented Stata command: graph addplot. I show how below.

        Again, begin at the "Begin here" comment; the stuff above is to create a dataset whose structure mimics yours. (For brevity in this case, I've limited the predictors to just the two that you're trying to plot.)

        To be frank, though, with only binary predictors, the Tufte data-ink ratio is pretty low, and you might want to consider putting the values, test statistics etc. in a table instead.

        .ÿ
        .ÿversionÿ17.0

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿ//ÿseedem
        .ÿsetÿseedÿ366679765

        .ÿ
        .ÿtempfileÿdataset

        .ÿquietlyÿsaveÿ`dataset',ÿemptyok

        .ÿ
        .ÿtempnameÿCorr

        .ÿforvaluesÿexperienceÿ=ÿ1/3ÿ{
        ÿÿ2.ÿ
        .ÿÿÿÿÿÿÿÿÿlocalÿcorrÿ=ÿ0.5ÿ+ÿ`experience'ÿ*ÿ0.1
        ÿÿ3.ÿÿÿÿÿÿÿÿÿmatrixÿdefineÿ`Corr'ÿ=ÿ1,ÿ`corr',ÿ0.9ÿ\ÿ`corr',ÿ1,ÿ`corr'ÿ\ÿ0.9,ÿ`corr',ÿ1
        ÿÿ4.ÿ
        .ÿÿÿÿÿÿÿÿÿquietlyÿdrawnormÿtrue_valueÿpred_1ÿAI_pred,ÿdoubleÿ///
        >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcorr(`Corr')ÿn(`=150000/3')ÿclear
        ÿÿ5.ÿ
        .ÿÿÿÿÿÿÿÿÿforeachÿvarÿofÿvarlistÿ_allÿ{
        ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿquietlyÿreplaceÿ`var'ÿ=ÿ`var'ÿ>ÿ0
        ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
        ÿÿ8.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿpred_1_expÿ=ÿ`experience'
        ÿÿ9.ÿ
        .ÿÿÿÿÿÿÿÿÿappendÿusingÿ`dataset'
        ÿ10.ÿÿÿÿÿÿÿÿÿquietlyÿsaveÿ`dataset',ÿreplace
        ÿ11.ÿ}

        .ÿ
        .ÿgenerateÿlongÿidÿ=ÿ_n

        .ÿ
        .ÿ*
        .ÿ*ÿBeginÿhere
        .ÿ*
        .ÿ//ÿBeginÿgettingÿAIÿpredictions'ÿROCÿAUC
        .ÿquietlyÿlogitÿtrue_valueÿAI_pred

        .ÿ
        .ÿlsensÿ,ÿgensens(sen)ÿgenspec(osp)ÿnograph

        .ÿquietlyÿreplaceÿospÿ=ÿ1ÿ-ÿosp

        .ÿ
        .ÿlrocÿ,ÿnograph

        Logisticÿmodelÿforÿtrue_value

        Numberÿofÿobservationsÿ=ÿÿÿ150000
        AreaÿunderÿROCÿcurveÿÿÿ=ÿÿÿ0.8581

        .ÿlocalÿAIlabelÿ:ÿdisplayÿ"AIÿROCÿarea:ÿ"ÿ%05.3fÿr(area)

        .ÿlabelÿvariableÿsenÿ"`AIlabel'"

        .ÿ//ÿEndÿgetttingÿAIÿpredictions'ÿROCÿAUC
        .ÿ
        .ÿroccompÿtrue_valueÿpred_1,ÿby(pred_1_exp)ÿgraphÿsummary

        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿROCÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿAsymptoticÿnormalÿÿ
        pred_1_expÿÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿareaÿÿÿÿÿStd.ÿerr.ÿÿÿÿÿÿ[95%ÿconf.ÿinterval]
        -------------------------------------------------------------------------
        1ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ50000ÿÿÿÿÿ0.7056ÿÿÿÿÿÿÿ0.0020ÿÿÿÿÿÿÿÿ0.70156ÿÿÿÿÿ0.70956
        2ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ50000ÿÿÿÿÿ0.7462ÿÿÿÿÿÿÿ0.0019ÿÿÿÿÿÿÿÿ0.74238ÿÿÿÿÿ0.75001
        3ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ50000ÿÿÿÿÿ0.7927ÿÿÿÿÿÿÿ0.0018ÿÿÿÿÿÿÿÿ0.78913ÿÿÿÿÿ0.79623
        -------------------------------------------------------------------------
        H0:ÿarea(1)ÿ=ÿarea(2)ÿ=ÿarea(3)
        ÿÿÿÿchi2(2)ÿ=ÿÿ1031.25ÿÿÿÿÿÿÿProb>chi2ÿ=ÿÿÿ0.0000

        .ÿgraphÿaddplotÿconnectedÿsenÿosp,ÿsortÿ//ÿ<=ÿHere

        .ÿ
        .ÿquietlyÿgraphÿexportÿoverlay.png

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .


        Again, with Stata's default settings, these graphs are pretty ugly.

        Click image for larger version

Name:	overlay.png
Views:	1
Size:	59.5 KB
ID:	1664933

        Comment

        Working...
        X