I need help with multiclass in ROC

minh dust

Join Date: Nov 2021

Posts: 5
#1

I need help with multiclass in ROC

03 Nov 2021, 10:30

Hi everyone,
I have a problem when calculating multiclass in ROC. I have two variables which are "gpb" - the result of pathology (as the gold standard) and "iota" - the result of IOTA ADNEX model. They are both classified as 0=benign, 1=borderline, 2=primary invasive stage I, 3=primary invasive stage II–V and 4=metastatic cancer.
I need to calculate the AUC for the basic discrimination between benign and malignant tumours using the total risk of malignancy (i.e., the sum of the estimated risks of the four malignant subtypes: borderline, primary invasive stage I, primary invasive stage II–V and metastatic cancer).
As you can see in the example table below, I don't know how to calculate the pairwise AUCs by stata.

Last edited by minh dust; 03 Nov 2021, 10:33.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

03 Nov 2021, 10:59

I believe this code will do it:

Code:

label define classification 0 "benign" /// 1 "borderline" /// 2 "stage I" /// 3 "stage II-V" /// 4 "metastatic" frame create table_of_rocs str48 contrast float auc forvalues c1 = 0/3 { forvalues c2 = `=`c1'+1'/4 { gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2') gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2') roctab gpb_pos iota_pos frame post table_of_rocs (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area)) drop gpb_pos iota_pos } } frame table_of_rocs { list, noobs clean }

Note: As no example data was supplied, this code is untested. It may contain typos or other errors.

In the future, when asking for help with code, show example data. And when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Added: All of that said, it isn't quite clear to me why all of these fine-grained partial comparisons are useful. I would be more inclined to use a measure of ordinal concordance that reflects the agreement of iota and gbp across the full range of diagnoses, such as Somer's D, or its close relative Harrell's C (which can be understood as similar to an ROC area). If you are interested in pursuing that, get -somersd.ado-, by Roger Newson, from SSC.

Last edited by Clyde Schechter; 03 Nov 2021, 11:06.
2 likes
Comment
minh dust

Join Date: Nov 2021

Posts: 5
#3

08 Nov 2021, 05:29

Thank you for replying to me.
I follow your code but it seems not working. I need to calculate AUC area also 95%CI. Here is my data. I really appreciate if you could test it.
Thank you so much!
Attached Files

data.xlsx (9.0 KB, 1 view)
Comment
minh dust

Join Date: Nov 2021

Posts: 5
#4

08 Nov 2021, 06:24

Originally posted by Clyde Schechter View Post

I believe this code will do it:

Code:

label define classification 0 "benign" /// 1 "borderline" /// 2 "stage I" /// 3 "stage II-V" /// 4 "metastatic" frame create table_of_rocs str48 contrast float auc forvalues c1 = 0/3 { forvalues c2 = `=`c1'+1'/4 { gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2') gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2') roctab gpb_pos iota_pos frame post table_of_rocs (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area)) drop gpb_pos iota_pos } } frame table_of_rocs { list, noobs clean }

Note: As no example data was supplied, this code is untested. It may contain typos or other errors.

In the future, when asking for help with code, show example data. And when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Added: All of that said, it isn't quite clear to me why all of these fine-grained partial comparisons are useful. I would be more inclined to use a measure of ordinal concordance that reflects the agreement of iota and gbp across the full range of diagnoses, such as Somer's D, or its close relative Harrell's C (which can be understood as similar to an ROC area). If you are interested in pursuing that, get -somersd.ado-, by Roger Newson, from SSC.

I am using stata 15, but it could not find frame command, i can't search it on help too
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

08 Nov 2021, 09:46

I don't see anything in your original post that suggests you wanted the confidence intervals as well. Had you said that, I would have given you code that does that, but like most people, my mind-reading skills are mediocre.

Frames were introduced to Stata in version 16. The current Stata version is 17. The Forum FAQ states clearly that if you are not using the current version of Stata you should say what version you are using in your post. That way, people would respond to you with code that is compatible with the version you are actually using. So, by not following the guidance of the FAQ you have wasted both your time and mine. Before posting again, please be sure to read the forum FAQ in their entirety and follow the guidance there in future posts--they are not rules for the sake of rules, they are guidelines to enhance the usefulness of the Forum to everybody.

Another thing the Forum FAQ makes clear is that attachments are discouraged in general, attachments of spreadsheets are particularly deprecated, and that the appropriate way to show example data is with the -dataex- command. Even if you didn't read the FAQ, did you read what I wrote about using -dataex- in #2? If so, why did you ignore it? I am among the many here who will not risk downloading attachments from strangers. Moreover, spreadsheets do not contain the metadata often needed to write Stata code correctly. If you have not yet imported this data to Stata, it is premature to be asking for help with any code at all (other than help with code to import the data.) If you have, there is no reason not to use -dataex- to helpfully provide test data.

So this code, adapted to version 15, is untested:

Code:

label define classification 0 "benign" /// 1 "borderline" /// 2 "stage I" /// 3 "stage II-V" /// 4 "metastatic" capture postutil clear tempfile table_of_rocs postfile handle str48 contrast float(auc lb ub) using `table_of_rocs' forvalues c1 = 0/3 { forvalues c2 = `=`c1'+1'/4 { gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2') gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2') roctab gpb_pos iota_pos post handle (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area)) /// (r(lb) r(ub)) drop gpb_pos iota_pos } } post close handle use `table_of_rocs', clear list, noobs clean

Last edited by Clyde Schechter; 08 Nov 2021, 09:48.
Comment
minh dust

Join Date: Nov 2021

Posts: 5
#6

08 Nov 2021, 20:40

Sorry sir for my mistake, here is my dataex, In my data there was no value "4=metastatic cancer" for both variances.
. dataex gpb iota

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(gpb iota) 0 1 0 0 1 1 3 1 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 2 0 0 3 3 2 3 0 1 0 0 0 3 0 0 0 0 0 0 0 0 3 3 0 3 0 0 0 0 0 0 0 0 0 0 0 1 2 0 1 1 2 3 2 3 0 0 0 0 1 1 2 1 0 0 0 0 2 0 0 0 0 1 0 0 0 3 3 3 3 3 1 3 0 0 0 1 0 0 0 1 0 0 1 1 0 2 0 0 0 0 2 1 3 3 3 3 3 3 3 3 3 3 3 3 2 0 0 1 1 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 2 1 3 1 3 3 end

I update my stata to 16 and it seems had a mistake

. . label define classification 0 "benign" ///
> 1 "borderline" ///
> 2 "stage I" ///
> 3 "stage II-V" ///
> 4 "metastatic"

.
. frame create table_of_rocs str48 contrast float auc

.
. forvalues c1 = 0/3 {
2. forvalues c2 = `=`c1'+1'/4 {
3. gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2')
4. gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2')
5. roctab gpb_pos iota_pos
6. frame post table_of_rocs (`"`:label classification `c1'' vs `:label classif
> ication `c2''"') (r(area))
7. drop gpb_pos iota_pos
8. }
9. }
(24 missing values generated)
(21 missing values generated)

ROC -Asymptotic Normal--
Obs Area Std. Err. [95% Conf. Interval]
------------------------------------------------------------
48 0.8186 0.1040 0.61481 1.00000
(21 missing values generated)
(36 missing values generated)

ROC -Asymptotic Normal--
Obs Area Std. Err. [95% Conf. Interval]
------------------------------------------------------------
40 0.4865 0.0135 0.46000 0.51297
(17 missing values generated)
(19 missing values generated)

ROC -Asymptotic Normal--
Obs Area Std. Err. [95% Conf. Interval]
------------------------------------------------------------
52 0.9083 0.0481 0.81407 1.00000
(31 missing values generated)
(38 missing values generated)
outcome does not vary
r(2000);

end of do-file

r(2000);
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

08 Nov 2021, 21:56

OK. So, removing the uninstantiated metastatic classification, and cleaning up some typos, and providing a nicer output layout:

Code:

label define classification 0   "benign"    ///
                            1   "borderline"    ///
                            2   "stage I"   ///
                            3   "stage II-V"    

capture postutil clear
tempfile table_of_rocs
postfile handle str48 contrast float(auc lb ub) using `table_of_rocs'
                            
forvalues c1 = 0/2 {
    forvalues c2 = `=`c1'+1'/3 {
        gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2')
        gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2')
        roctab gpb_pos iota_pos
        post handle (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area)) ///
            (r(lb)) (r(ub))
        drop gpb_pos iota_pos
    }
}
postclose handle

use `table_of_rocs', clear
format auc lb ub %03.2f
list, noobs clean

Last edited by Clyde Schechter; 08 Nov 2021, 22:00.

Announcement

I need help with multiclass in ROC

Comment

Comment

Comment

Comment

Comment

Comment