Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • AUC in cutpt vs roctab

    Good morning,
    I use . cutpt to calculate cut points (reference ranges) for laboratory diagnostic methods. My outcome (classvar) is continuous (concentration in pg/mL). In one case I am getting very discrepant results of the AUC after . cutpt and . roctab. They are here:

    Code:
    . cutpt c1 ov
    
    Empirical cutpoint estimation
    Method: Liu
    Reference variable: c1 (0=neg, 1=pos)
    Classification variable: ov
    Empirical optimal cutpoint: 571.5
    Sensitivity at cutpoint: 0.94
    Specificity at cutpoint: 0.78
    Area under ROC curve at cutpoint: 0.86
    and:

    Code:
    . roctab c1 ov
    
                          ROC                    -Asymptotic Normal--
               Obs       Area     Std. Err.      [95% Conf. Interval]
         ------------------------------------------------------------
               133     0.9282       0.0210        0.88708     0.96929
    For all other outcomes (five or so) the difference is negligible (say, 0.01).

    (1) Is there any explanation for this discrepancy?
    (2) What does it actually mean "Area under ROC curve at cutpoint"? To my understanding there is not such thing as AUC at cutpoint; AUC is one and the same for all points.

    Thank you in advance for commenting.
    Best,
    Piotr Lewczuk

  • #2
    This forum is so extremely helpful... time to migrate to SAS.

    Comment


    • #3
      Snide remark aside, the results are not in fact discrepant.

      What the cut represents is one kind of "optimal" location in which to dichotomize your original variable. I have my own criticisms about these cutpoint algorithms but that's a separate discussion. Once you have dichotomized your variable, the cutpt command calculates the AUC. Geometrically, your ROC curve now looks like a triangle because you have just one non-trivial point at which sensitivity/specificity change -- the selected cut point. The amount of information loss is the area above the dichotomized curve but below the AUC from the original (raw scale) variable. In your example, that information loss is appreciably large and is perfectly expected because to dichotomize a variable throws away useful information.

      Comment


      • #4
        If you can get your work done better and quicker with SAS, sure,why not migrate?

        While I do not know cutpt, I just want to point out that this is a user-written ado and not official part of Stata. It might be a good idea to contact the author of this command to provide more information as what he means by this. Maybe he uses a different algorithm or a different definition? Maybe its a bug (maybe its also a bug in Stata?)? Maybe you can compare your results to what SAS, if available.
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Here's an example to show you the visual interpretation of what I said in #2, begin at the "Start Here". Note that in this example, the cut-off is determined as 3.5, which is a silly choice by this program since I know that that value is not possible. Since the threshold is (>= 3.5) this should be rounded up to (>=4) which yields the same results. With your concentration data, you probably won't have this issue, but I do here because of my artificial dataset with few, distinct levels.

          Code:
          clear *
          cls
          
          set seed 17
          
          set obs 200
          gen x = runiformint(1,5)
          gen y = rbinomial(1, invlogit(-1.2 + 0.6*x))
          fre y
          
          // Start here
          
          * original variable X used to make ROC curve
          qui logit y x
          lroc, title(Original X) name(roc1, replace)
          roctab y x, detail
          
          * find cut point of X and then make ROC curve
          cutpt y x
          gen byte x_cut = x >= 4
          qui logit y x_cut
          lroc, title(Dichotomized X) name(roc2, replace)
          roctab y x_cut, detail

          Comment


          • #6
            Oh, I have contacted Dr. Clayton, of course; I have been waiting for his answer two months now.

            Comment


            • #7
              Originally posted by Piotr Lewczuk View Post
              Oh, I have contacted Dr. Clayton, of course; I have been waiting for his answer two months now.
              Did you inspect the code and results in #5?

              Comment


              • #8
                Originally posted by Piotr Lewczuk View Post
                This forum is so extremely helpful... time to migrate to SAS.
                At risk of piling on: People here are not Stata employees. We aren't obliged to help you, or anyone else. We help because it's a service to the community.

                The folks who write add-on Stata packages (or R packages, or whatever the SAS equivalent if there is one) also aren't contractually obliged to help you, for better or worse. Academic jobs don't provide support for this sort of extra-curricular activity - again, for better or worse.

                If nobody responds to a query, then it's possible it simply got missed amidst all the queries on the forums. You're allowed to bump your question a couple of times, within reason. Another possible reason, especially if the question involves a niche specialty, is that nobody knows how to answer it.

                Being rude to people on the forum makes it less likely that you receive help.
                Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                Comment


                • #9
                  @Leonardo,
                  What do you mean by "geometrically, your ROC curve now looks like a triangle." What triangle do you mean? Where are its vertices on the Sensitivity/(1-Specificity) plane?
                  You also write (if I understand) that the larger AUC the more information we are loosing. Is that what you mean? It is actually vice versa, the larger AUC, the less information is "thrown away" and more preserved; consider a boundary case with AUC = 1: when you able to dichotomize your continuous variable such way, that your AUC = 1 then you do not lose information at all, because it does not matter which of the two variables (continuous or "new" binomial) you use to correctly classify your outcome.
                  Do you have a reference (a textbook or a paper) for what you have written or are those your own ideas?

                  Comment


                  • #10
                    Originally posted by Piotr Lewczuk View Post
                    @Leonardo,
                    What do you mean by "geometrically, your ROC curve now looks like a triangle." What triangle do you mean? Where are its vertices on the Sensitivity/(1-Specificity) plane?
                    You also write (if I understand) that the larger AUC the more information we are loosing. Is that what you mean? It is actually vice versa, the larger AUC, the less information is "thrown away" and more preserved; consider a boundary case with AUC = 1: when you able to dichotomize your continuous variable such way, that your AUC = 1 then you do not lose information at all, because it does not matter which of the two variables (continuous or "new" binomial) you use to correctly classify your outcome.
                    Do you have a reference (a textbook or a paper) for what you have written or are those your own ideas?
                    The triangle I speak of is defined by the two trivial points of (sens, spec) being (1, 0) and (0, 1) forming the line of no information (or chance line), and the only other remaining point is the one defined at the cutpoint. (Okay, I guess it can be called a trapezoid if one includes the area under this line.) The attached graphic should make it easier to see.

                    You are correct that a larger AUC contains more information relative to a smaller AUC. But I think you misunderstood the point. Typically the ROC curve drawn using the original variable X will dominate (that is, for every point will be at least as large as) the ROC curve drawn from the dichotomized version of X. There is then an area that exists between these two curves, which corresponds to the absolute difference in AUC and represents the information loss following dichotomization.That is, anything which shifts the ROC curve towards the 45°-line loses information about the status of the outcome.

                    For references, these are simply the mathematics of the ROC curve. For the cut point algorithm used in -cutpt-, those references are given at the bottom of -help cutpt-. I have only used the ideas of how to draw a ROC curve given a single classification variable and outcome variable, and then the area under that curve. A good intro to the ROC curve is given in the Hanley and McNeil article below. A detailed treatment of classification and prediction in medical contexts is given in Margaret Pepe's book "The Statistical Evaluation of Medical Tests for Classification and Prediction" published by Oxford University Press. A discussion about the consequences (of information loss) when dichotomizing a variable is found in Federov.

                    Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747

                    Fedorov, V., Mannino, F., & Zhang, R. (2009). Consequences of dichotomization. Pharmaceutical Statistics, 8(1), 50–61. https://doi.org/10.1002/pst.331

                    Code to produce the graphic:

                    Code:
                    clear *
                    cls
                    
                    set seed 17
                    
                    set obs 200
                    gen x = runiformint(1,5)
                    gen y = rbinomial(1, invlogit(-1.2 + 0.6*x))
                    fre y
                    
                    // Start here
                    
                    * newinal variable X used to make ROC curve
                    roctab y x, detail
                    local orig_auc = strofreal(`r(area)'*100, "%5.1f") + "%"
                    mat defin Orig = r(detail)[., 1..3]
                    svmat double Orig
                    rename (Orig1-Orig3) (orig_cut orig_se orig_sp)
                    gen double orig_1msp = 100 - orig_sp
                    
                    * find cut point of X and then make ROC curve
                    cutpt y x
                    gen byte x_cut = x >= 4
                    roctab y x_cut, detail
                    local new_auc = strofreal(`r(area)'*100, "%5.1f") + "%"
                    mat defin New = r(detail)[., 1..3]
                    svmat double New
                    rename (New1-New3) (new_cut new_se new_sp)
                    gen double new_1msp = 100 - new_sp
                    
                    twoway sc orig_se orig_1msp, c(l) mcol(blue) msize(small) lcol(blue) || ///
                           sc new_se new_1msp, c(l) mcol(red) msize(small) lcol(red) || ///
                           function y = x , range(0 100) lpatt(dash) lcol(black) ///
                           , xti("1 - Specificity") yti("Sensitivity") ///
                           legend(label(1 "Original ROC, AUC=`orig_auc'") ///
                                  label(2 "Dichotomous ROC, AUC=`new_auc'") ///
                                  size(small) rows(1) order(1 2) )
                    Attached Files

                    Comment


                    • #11
                      Thanks, Leonardo.
                      Okay, I guess it can be called a trapezoid if one includes the area under this line.
                      That was the point where I did not get what you mean. If your "triangle" is the red trapezoid, the rest is of course obvious (which in NOT to say that it is uninteresting post).

                      What precludes this interpretation of the mysterious "AUC at the cutpoint" is that in majority of cases I have been using . cutpt (five years, or so) I have obtained excellent agreement with "Original ROC", as you call it (obtained with the . roctab command). According to your line of argumentation this should not be the case for a simple reason that the area under a trapezoid is very different from the area under the "original ROC". All in all this does not answer my question, nevertheless I thank you for interesting discussion.

                      Comment

                      Working...
                      X