AUC in cutpt vs roctab

Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#1

AUC in cutpt vs roctab

07 Jul 2021, 01:14

Good morning,
I use . cutpt to calculate cut points (reference ranges) for laboratory diagnostic methods. My outcome (classvar) is continuous (concentration in pg/mL). In one case I am getting very discrepant results of the AUC after . cutpt and . roctab. They are here:

Code:

. cutpt c1 ov Empirical cutpoint estimation Method: Liu Reference variable: c1 (0=neg, 1=pos) Classification variable: ov Empirical optimal cutpoint: 571.5 Sensitivity at cutpoint: 0.94 Specificity at cutpoint: 0.78 Area under ROC curve at cutpoint: 0.86

and:

Code:

. roctab c1 ov ROC -Asymptotic Normal-- Obs Area Std. Err. [95% Conf. Interval] ------------------------------------------------------------ 133 0.9282 0.0210 0.88708 0.96929

For all other outcomes (five or so) the difference is negligible (say, 0.01).

(1) Is there any explanation for this discrepancy?
(2) What does it actually mean "Area under ROC curve at cutpoint"? To my understanding there is not such thing as AUC at cutpoint; AUC is one and the same for all points.

Thank you in advance for commenting.
Best,
Piotr Lewczuk
Tags: None
Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#2

21 Sep 2021, 01:51

This forum is so extremely helpful... time to migrate to SAS.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#3

21 Sep 2021, 07:20

Snide remark aside, the results are not in fact discrepant.

What the cut represents is one kind of "optimal" location in which to dichotomize your original variable. I have my own criticisms about these cutpoint algorithms but that's a separate discussion. Once you have dichotomized your variable, the cutpt command calculates the AUC. Geometrically, your ROC curve now looks like a triangle because you have just one non-trivial point at which sensitivity/specificity change -- the selected cut point. The amount of information loss is the area above the dichotomized curve but below the AUC from the original (raw scale) variable. In your example, that information loss is appreciably large and is perfectly expected because to dichotomize a variable throws away useful information.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 699
#4

21 Sep 2021, 07:22

If you can get your work done better and quicker with SAS, sure,why not migrate?

While I do not know cutpt, I just want to point out that this is a user-written ado and not official part of Stata. It might be a good idea to contact the author of this command to provide more information as what he means by this. Maybe he uses a different algorithm or a different definition? Maybe its a bug (maybe its also a bug in Stata?)? Maybe you can compare your results to what SAS, if available.

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#5

21 Sep 2021, 07:55

Here's an example to show you the visual interpretation of what I said in #2, begin at the "Start Here". Note that in this example, the cut-off is determined as 3.5, which is a silly choice by this program since I know that that value is not possible. Since the threshold is (>= 3.5) this should be rounded up to (>=4) which yields the same results. With your concentration data, you probably won't have this issue, but I do here because of my artificial dataset with few, distinct levels.

Code:

clear * cls set seed 17 set obs 200 gen x = runiformint(1,5) gen y = rbinomial(1, invlogit(-1.2 + 0.6*x)) fre y // Start here * original variable X used to make ROC curve qui logit y x lroc, title(Original X) name(roc1, replace) roctab y x, detail * find cut point of X and then make ROC curve cutpt y x gen byte x_cut = x >= 4 qui logit y x_cut lroc, title(Dichotomized X) name(roc2, replace) roctab y x_cut, detail
Comment
Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#6

22 Sep 2021, 06:16

Oh, I have contacted Dr. Clayton, of course; I have been waiting for his answer two months now.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#7

22 Sep 2021, 07:18

Originally posted by Piotr Lewczuk View Post

Oh, I have contacted Dr. Clayton, of course; I have been waiting for his answer two months now.

Did you inspect the code and results in #5?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#8

22 Sep 2021, 07:55

Originally posted by Piotr Lewczuk View Post

This forum is so extremely helpful... time to migrate to SAS.

At risk of piling on: People here are not Stata employees. We aren't obliged to help you, or anyone else. We help because it's a service to the community.

The folks who write add-on Stata packages (or R packages, or whatever the SAS equivalent if there is one) also aren't contractually obliged to help you, for better or worse. Academic jobs don't provide support for this sort of extra-curricular activity - again, for better or worse.

If nobody responds to a query, then it's possible it simply got missed amidst all the queries on the forums. You're allowed to bump your question a couple of times, within reason. Another possible reason, especially if the question involves a niche specialty, is that nobody knows how to answer it.

Being rude to people on the forum makes it less likely that you receive help.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
2 likes
Comment
Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#9

23 Sep 2021, 07:53

@Leonardo,
What do you mean by "geometrically, your ROC curve now looks like a triangle." What triangle do you mean? Where are its vertices on the Sensitivity/(1-Specificity) plane?
You also write (if I understand) that the larger AUC the more information we are loosing. Is that what you mean? It is actually vice versa, the larger AUC, the less information is "thrown away" and more preserved; consider a boundary case with AUC = 1: when you able to dichotomize your continuous variable such way, that your AUC = 1 then you do not lose information at all, because it does not matter which of the two variables (continuous or "new" binomial) you use to correctly classify your outcome.
Do you have a reference (a textbook or a paper) for what you have written or are those your own ideas?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#10

23 Sep 2021, 12:29

Originally posted by Piotr Lewczuk View Post

@Leonardo,
What do you mean by "geometrically, your ROC curve now looks like a triangle." What triangle do you mean? Where are its vertices on the Sensitivity/(1-Specificity) plane?
You also write (if I understand) that the larger AUC the more information we are loosing. Is that what you mean? It is actually vice versa, the larger AUC, the less information is "thrown away" and more preserved; consider a boundary case with AUC = 1: when you able to dichotomize your continuous variable such way, that your AUC = 1 then you do not lose information at all, because it does not matter which of the two variables (continuous or "new" binomial) you use to correctly classify your outcome.
Do you have a reference (a textbook or a paper) for what you have written or are those your own ideas?

The triangle I speak of is defined by the two trivial points of (sens, spec) being (1, 0) and (0, 1) forming the line of no information (or chance line), and the only other remaining point is the one defined at the cutpoint. (Okay, I guess it can be called a trapezoid if one includes the area under this line.) The attached graphic should make it easier to see.

You are correct that a larger AUC contains more information relative to a smaller AUC. But I think you misunderstood the point. Typically the ROC curve drawn using the original variable X will dominate (that is, for every point will be at least as large as) the ROC curve drawn from the dichotomized version of X. There is then an area that exists between these two curves, which corresponds to the absolute difference in AUC and represents the information loss following dichotomization.That is, anything which shifts the ROC curve towards the 45°-line loses information about the status of the outcome.

For references, these are simply the mathematics of the ROC curve. For the cut point algorithm used in -cutpt-, those references are given at the bottom of -help cutpt-. I have only used the ideas of how to draw a ROC curve given a single classification variable and outcome variable, and then the area under that curve. A good intro to the ROC curve is given in the Hanley and McNeil article below. A detailed treatment of classification and prediction in medical contexts is given in Margaret Pepe's book "The Statistical Evaluation of Medical Tests for Classification and Prediction" published by Oxford University Press. A discussion about the consequences (of information loss) when dichotomizing a variable is found in Federov.

Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143(1), 29–36. https://doi.org/10.1148/radiology.143.1.7063747

Fedorov, V., Mannino, F., & Zhang, R. (2009). Consequences of dichotomization. Pharmaceutical Statistics, 8(1), 50–61. https://doi.org/10.1002/pst.331

Code to produce the graphic:

Code:

clear * cls set seed 17 set obs 200 gen x = runiformint(1,5) gen y = rbinomial(1, invlogit(-1.2 + 0.6*x)) fre y // Start here * newinal variable X used to make ROC curve roctab y x, detail local orig_auc = strofreal(`r(area)'*100, "%5.1f") + "%" mat defin Orig = r(detail)[., 1..3] svmat double Orig rename (Orig1-Orig3) (orig_cut orig_se orig_sp) gen double orig_1msp = 100 - orig_sp * find cut point of X and then make ROC curve cutpt y x gen byte x_cut = x >= 4 roctab y x_cut, detail local new_auc = strofreal(`r(area)'*100, "%5.1f") + "%" mat defin New = r(detail)[., 1..3] svmat double New rename (New1-New3) (new_cut new_se new_sp) gen double new_1msp = 100 - new_sp twoway sc orig_se orig_1msp, c(l) mcol(blue) msize(small) lcol(blue) || /// sc new_se new_1msp, c(l) mcol(red) msize(small) lcol(red) || /// function y = x , range(0 100) lpatt(dash) lcol(black) /// , xti("1 - Specificity") yti("Sensitivity") /// legend(label(1 "Original ROC, AUC=`orig_auc'") /// label(2 "Dichotomous ROC, AUC=`new_auc'") /// size(small) rows(1) order(1 2) )

Attached Files
Comment
Piotr Lewczuk

Join Date: Apr 2016

Posts: 59
#11

24 Sep 2021, 00:11

Thanks, Leonardo.

Okay, I guess it can be called a trapezoid if one includes the area under this line.

That was the point where I did not get what you mean. If your "triangle" is the red trapezoid, the rest is of course obvious (which in NOT to say that it is uninteresting post).

What precludes this interpretation of the mysterious "AUC at the cutpoint" is that in majority of cases I have been using . cutpt (five years, or so) I have obtained excellent agreement with "Original ROC", as you call it (obtained with the . roctab command). According to your line of argumentation this should not be the case for a simple reason that the area under a trapezoid is very different from the area under the "original ROC". All in all this does not answer my question, nevertheless I thank you for interesting discussion.
Comment

Announcement

AUC in cutpt vs roctab

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment