Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • I need help with multiclass in ROC

    Hi everyone,
    I have a problem when calculating multiclass in ROC. I have two variables which are "gpb" - the result of pathology (as the gold standard) and "iota" - the result of IOTA ADNEX model. They are both classified as 0=benign, 1=borderline, 2=primary invasive stage I, 3=primary invasive stage II–V and 4=metastatic cancer.
    I need to calculate the AUC for the basic discrimination between benign and malignant tumours using the total risk of malignancy (i.e., the sum of the estimated risks of the four malignant subtypes: borderline, primary invasive stage I, primary invasive stage II–V and metastatic cancer).
    As you can see in the example table below, I don't know how to calculate the pairwise AUCs by stata.
    Click image for larger version

Name:	examble.jpg
Views:	1
Size:	33.0 KB
ID:	1634604

    Last edited by minh dust; 03 Nov 2021, 10:33.

  • #2
    I believe this code will do it:
    Code:
    label define classification 0   "benign"    ///
                                1   "borderline"    ///
                                2   "stage I"   ///
                                3   "stage II-V"    ///
                                4   "metastatic"
    
    frame create table_of_rocs str48 contrast float auc
    
    forvalues c1 = 0/3 {
        forvalues c2 = `=`c1'+1'/4 {
            gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2')
            gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2')
            roctab gpb_pos iota_pos
            frame post table_of_rocs (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area))
            drop gpb_pos iota_pos
        }
    }
    
    frame table_of_rocs {
        list, noobs clean
    }
    Note: As no example data was supplied, this code is untested. It may contain typos or other errors.

    In the future, when asking for help with code, show example data. And when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Added: All of that said, it isn't quite clear to me why all of these fine-grained partial comparisons are useful. I would be more inclined to use a measure of ordinal concordance that reflects the agreement of iota and gbp across the full range of diagnoses, such as Somer's D, or its close relative Harrell's C (which can be understood as similar to an ROC area). If you are interested in pursuing that, get -somersd.ado-, by Roger Newson, from SSC.
    Last edited by Clyde Schechter; 03 Nov 2021, 11:06.

    Comment


    • #3
      Thank you for replying to me.
      I follow your code but it seems not working. I need to calculate AUC area also 95%CI. Here is my data. I really appreciate if you could test it.
      Thank you so much!
      Attached Files

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        I believe this code will do it:
        Code:
        label define classification 0 "benign" ///
        1 "borderline" ///
        2 "stage I" ///
        3 "stage II-V" ///
        4 "metastatic"
        
        frame create table_of_rocs str48 contrast float auc
        
        forvalues c1 = 0/3 {
        forvalues c2 = `=`c1'+1'/4 {
        gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2')
        gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2')
        roctab gpb_pos iota_pos
        frame post table_of_rocs (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area))
        drop gpb_pos iota_pos
        }
        }
        
        frame table_of_rocs {
        list, noobs clean
        }
        Note: As no example data was supplied, this code is untested. It may contain typos or other errors.

        In the future, when asking for help with code, show example data. And when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use -dataex-.

        Added: All of that said, it isn't quite clear to me why all of these fine-grained partial comparisons are useful. I would be more inclined to use a measure of ordinal concordance that reflects the agreement of iota and gbp across the full range of diagnoses, such as Somer's D, or its close relative Harrell's C (which can be understood as similar to an ROC area). If you are interested in pursuing that, get -somersd.ado-, by Roger Newson, from SSC.
        I am using stata 15, but it could not find frame command, i can't search it on help too

        Comment


        • #5
          I don't see anything in your original post that suggests you wanted the confidence intervals as well. Had you said that, I would have given you code that does that, but like most people, my mind-reading skills are mediocre.

          Frames were introduced to Stata in version 16. The current Stata version is 17. The Forum FAQ states clearly that if you are not using the current version of Stata you should say what version you are using in your post. That way, people would respond to you with code that is compatible with the version you are actually using. So, by not following the guidance of the FAQ you have wasted both your time and mine. Before posting again, please be sure to read the forum FAQ in their entirety and follow the guidance there in future posts--they are not rules for the sake of rules, they are guidelines to enhance the usefulness of the Forum to everybody.

          Another thing the Forum FAQ makes clear is that attachments are discouraged in general, attachments of spreadsheets are particularly deprecated, and that the appropriate way to show example data is with the -dataex- command. Even if you didn't read the FAQ, did you read what I wrote about using -dataex- in #2? If so, why did you ignore it? I am among the many here who will not risk downloading attachments from strangers. Moreover, spreadsheets do not contain the metadata often needed to write Stata code correctly. If you have not yet imported this data to Stata, it is premature to be asking for help with any code at all (other than help with code to import the data.) If you have, there is no reason not to use -dataex- to helpfully provide test data.

          So this code, adapted to version 15, is untested:

          Code:
          label define classification 0   "benign"    ///
                                      1   "borderline"    ///
                                      2   "stage I"   ///
                                      3   "stage II-V"    ///
                                      4   "metastatic"
          
          capture postutil clear
          tempfile table_of_rocs
          postfile handle str48 contrast float(auc lb ub) using `table_of_rocs'
                                      
          forvalues c1 = 0/3 {
              forvalues c2 = `=`c1'+1'/4 {
                  gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2')
                  gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2')
                  roctab gpb_pos iota_pos
                  post handle (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area)) ///
                      (r(lb) r(ub))
                  drop gpb_pos iota_pos
              }
          }
          post close handle
          
          use `table_of_rocs', clear
          list, noobs clean
          Last edited by Clyde Schechter; 08 Nov 2021, 09:48.

          Comment


          • #6
            Sorry sir for my mistake, here is my dataex, In my data there was no value "4=metastatic cancer" for both variances.
            . dataex gpb iota

            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte(gpb iota)
            0 1
            0 0
            1 1
            3 1
            0 0
            0 0
            0 0
            0 0
            0 0
            0 0
            2 1
            1 2
            0 0
            3 3
            2 3
            0 1
            0 0
            0 3
            0 0
            0 0
            0 0
            0 0
            3 3
            0 3
            0 0
            0 0
            0 0
            0 0
            0 0
            0 1
            2 0
            1 1
            2 3
            2 3
            0 0
            0 0
            1 1
            2 1
            0 0
            0 0
            2 0
            0 0
            0 1
            0 0
            0 3
            3 3
            3 3
            1 3
            0 0
            0 1
            0 0
            0 1
            0 0
            1 1
            0 2
            0 0
            0 0
            2 1
            3 3
            3 3
            3 3
            3 3
            3 3
            3 3
            2 0
            0 1
            1 0
            0 3
            0 0
            0 0
            0 0
            0 0
            0 0
            0 0
            0 0
            3 0
            2 1
            3 1
            3 3
            end
            I update my stata to 16 and it seems had a mistake

            . . label define classification 0 "benign" ///
            > 1 "borderline" ///
            > 2 "stage I" ///
            > 3 "stage II-V" ///
            > 4 "metastatic"

            .
            . frame create table_of_rocs str48 contrast float auc

            .
            . forvalues c1 = 0/3 {
            2. forvalues c2 = `=`c1'+1'/4 {
            3. gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2')
            4. gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2')
            5. roctab gpb_pos iota_pos
            6. frame post table_of_rocs (`"`:label classification `c1'' vs `:label classif
            > ication `c2''"') (r(area))
            7. drop gpb_pos iota_pos
            8. }
            9. }
            (24 missing values generated)
            (21 missing values generated)

            ROC -Asymptotic Normal--
            Obs Area Std. Err. [95% Conf. Interval]
            ------------------------------------------------------------
            48 0.8186 0.1040 0.61481 1.00000
            (21 missing values generated)
            (36 missing values generated)

            ROC -Asymptotic Normal--
            Obs Area Std. Err. [95% Conf. Interval]
            ------------------------------------------------------------
            40 0.4865 0.0135 0.46000 0.51297
            (17 missing values generated)
            (19 missing values generated)

            ROC -Asymptotic Normal--
            Obs Area Std. Err. [95% Conf. Interval]
            ------------------------------------------------------------
            52 0.9083 0.0481 0.81407 1.00000
            (31 missing values generated)
            (38 missing values generated)
            outcome does not vary
            r(2000);

            end of do-file

            r(2000);

            Comment


            • #7
              OK. So, removing the uninstantiated metastatic classification, and cleaning up some typos, and providing a nicer output layout:

              Code:
              label define classification 0   "benign"    ///
                                          1   "borderline"    ///
                                          2   "stage I"   ///
                                          3   "stage II-V"    
              
              capture postutil clear
              tempfile table_of_rocs
              postfile handle str48 contrast float(auc lb ub) using `table_of_rocs'
                                          
              forvalues c1 = 0/2 {
                  forvalues c2 = `=`c1'+1'/3 {
                      gen gpb_pos = (gpb == `c2') if inlist(gpb, `c1', `c2')
                      gen iota_pos = (iota == `c2') if inlist(iota, `c1', `c2')
                      roctab gpb_pos iota_pos
                      post handle (`"`:label classification `c1'' vs `:label classification `c2''"') (r(area)) ///
                          (r(lb)) (r(ub))
                      drop gpb_pos iota_pos
                  }
              }
              postclose handle
              
              use `table_of_rocs', clear
              format auc lb ub %03.2f
              list, noobs clean
              Last edited by Clyde Schechter; 08 Nov 2021, 22:00.

              Comment

              Working...
              X