Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • n by K contingency table for two categorical variable (sensitivity, specificity, PPV, NPV, Accuracy) calculations

    Hi,

    How can I please calculate the measures of validity for a new "diagnostic test" compared to the "true/gold standard test" using stata when the contingency table is n by k?

    I would like to calculate the following [sensitivity, specificity, PPV, NPV, Accuracy]
    in addition, can I sum and calculate the following [ total true positive values, true negative, false positive, & false negative ] using stata code?

    I know for a 2 by 2 table that we can use the "diagtest" syntax, but it dose not work for larger tables if someone can kindly guide me through.


    my contingency tables are around (4*4) size

    thank you

  • #2
    Could you elaborate on how your data looks like, perhaps with an example (using dataex from SSC)? From a quick glance it seems you may want to analyze a confusion matrix. If you do not want standard errors, the calculations needed seem rather simple.

    Best
    Daniel
    Last edited by daniel klein; 22 Jan 2017, 06:39.

    Comment


    • #3

      Hi Daniel Klein,
      thank you for time and response. I attached one my 4 by 3 tables. I am interested in evaluating the:
      [sensitivity, specificity, PPV, NPV, Accuracy] & if possible the [ total true positive values, true negative, false positive, & false negative ] for when the participants' results are (week "coded number 2"), and when it is (paralysis results "coded number 3")
      and I am not sure how shall I use the participants that the "new diagnostic test" could not see their lesions and therefore could not evaluate nor diagnose them (unseen results occurred only with the "new test" and "coded number 4").

      the study can be considered a "cross-sectional" where each participant was evaluated by the "new test" first then by the "gold standard test"

      learning a "Stata" code or time saving method will be of a great benefit as I have many other tests to compare with the "Gold Standard" with similar contingency tables.

      Thank you and your guidance and support are highly appreciated

      Best Regards
      Attached Files

      Comment


      • #4
        Well, the terminology sensitivity, specificity, NPV, PPV is really designed around a dichotomous test and a dichotomous reference criterion ("gold standard.") The analogous conditional probabilities don't really have names.

        But to find those conditional probabilities, it is really easy: just -tab _new_test gold_standard, row col- The row percentages will be the analogs of the predictive values, and the column percentages will be the analogs of sensitivity.

        Comment


        • #5
          As Clyde points out, I guess there is a conceptual question here how these things are defined in your case. I would not be able to tell for sure. The more technical part of the question has been answered by Clyde. Here is the worked example for the data you give, omitting the missing ratings

          Code:
          tabi 225 10 2 \ 0 2 1 \ 2 1 4 , row column
          The row percentages for the main diagonal can be interpreted as predictive positive values (PPV) for each category. The column percentages in the main diagonal are the respective sensitivity values. I do not think there is a direct way of getting the specificity and negative predictive values, but this should be easy to program.

          The accuracy is basically the percent agreement that you can get from the kap command, or from kappaetc (SSC). Here is how

          Code:
          tabi 225 10 2 \ 0 2 1 \ 2 1 4 \ 473 28 7 , replace
          kap row col [fweight=pop]
          I specified the replace option to replace the data in memory with the data from your table. Alternatively you could type

          Code:
          kappaetci 225 10 2 \ 0 2 1 \ 2 1 4 \ 473 28 7 , categories(1 2 3 .) tab
          Note that the missing values are coded as such here. This does not affect percent agreement but some of the other agreement statistics.

          Best
          Daniel

          Comment


          • #6
            Please ignore the part about accuracy in my previous post. It does only apply to the two-by-two table case.

            Best
            Daniel

            Comment


            • #7
              I have put something together. Here is the program

              Code:
              *! version 1.0.0 23jan2017 daniel klein
              program conftab
                  version 12.1
                  
                  syntax varlist(numeric min = 2 max = 2) [ if ] [ in ] [ fweight/ ]
                  
                  marksample touse
                  quietly count if (`touse')
                  if (!r(N)) {
                      error 2000
                  }
                  
                  foreach var of local varlist {
                      capture assert `var' == int(`var') , fast
                      if (_rc) {
                          display as err "`var' contains non-integer values"
                          exit 459
                      }
                  }
                  
                  if ("`weight'" != "") {
                      capture assert `exp' == int(`exp')
                      if (_rc) {
                          error 401
                      }
                      local wgtexp [`weight' = `exp']
                  }
                  
                  tempname X
                  tabulate `varlist' if `touse' `wgtexp' , row column nokey matcell(`X')
                  
                  if (rowsof(`X') !=  colsof(`X')) {
                      display as err "matrix not square"
                      exit 503
                  }
                  
                  mata : conftab_ado("`X'")
                  
                  local nfmt : display _dup(`= colsof(r(P))') "%8.4f & "
                  matlist r(P) , cspec(& %24s | `nfmt') rspec(--&&&-&--)
              end
              
              version 12.1
              
              mata :
              
              void conftab_ado(string scalar matname)
              {
                  real matrix X, P
                  real colvector p1, p0, n1, n0 
                  
                  X = st_matrix(matname)
                  
                  p1 = diagonal(X)
                  p0 = rowsum(X)-diagonal(X)
                  n1 = colsum(X)'-diagonal(X)
                  n0 = J(rows(X), 1, .)
                  for (i = 1; i <= rows(X); ++i) {
                      n0[i] = sum(!e(i, rows(X)):*X:*!e(i, rows(X))')
                  }
                  
                  P = ( ///
                      p1:/(p1+p0),         /// sensivity
                      n0:/(n1+n0),         /// specificity
                      p1:/(p1+n1),         /// postive predictive value
                      n0:/(n0+p0),         /// negative predictive value
                      n1:/(n1+n0),         /// false positve rate
                      p0:/(p0+p1),         /// false negative rate
                      (p1+n0):/(sum(X))     /// accuracy
                      )
                      
                  if (rows(X) == 2) {
                      p1     = p1[1, 1]
                      n0     = n0[1, 1]
                      n1     = n1[1, 1]
                      n0     = n0[1, 1] 
                      P     = P[1, .]
                  }
                  
                  st_rclear()
                  
                  st_matrix("r(p1)", p1)
                  st_matrix("r(n0)", n0)
                  st_matrix("r(n1)", n1)
                  st_matrix("r(p0)", p0)
                  
                  
                  st_matrix("r(P)", P')
                  st_matrixrowstripe("r(P)",         ///
                  (J(cols(P), 1, ""),             ///
                  ("Sensitivity"                 \     ///
                  "Specificity"                 \     ///
                  "Postive predictive rate"     \     ///
                  "Negative predictive rate"     \     ///
                  "False postive rate"             \     ///
                  "False negative rate"           \     ///
                  "Accuracy")))
              }
              
              end
              The syntax is

              Code:
              conftab refvar classvar [ if ] [ in ] [ fweight ]
              Applied to the data given above (and ignoring missing ratings)

              Code:
              . tabi 225 10 2 \ 0 2 1 \ 2 1 4 , replace
              
                         |               col
                     row |         1          2          3 |     Total
              -----------+---------------------------------+----------
                       1 |       225         10          2 |       237 
                       2 |         0          2          1 |         3 
                       3 |         2          1          4 |         7 
              -----------+---------------------------------+----------
                   Total |       227         13          7 |       247 
              
                        Pearson chi2(4) = 115.1244   Pr = 0.000
              
              . conftab col row [fweight=pop]
              
                         |               row
                     col |         1          2          3 |     Total
              -----------+---------------------------------+----------
                       1 |       225          0          2 |       227 
                         |     99.12       0.00       0.88 |    100.00 
                         |     94.94       0.00      28.57 |     91.90 
              -----------+---------------------------------+----------
                       2 |        10          2          1 |        13 
                         |     76.92      15.38       7.69 |    100.00 
                         |      4.22      66.67      14.29 |      5.26 
              -----------+---------------------------------+----------
                       3 |         2          1          4 |         7 
                         |     28.57      14.29      57.14 |    100.00 
                         |      0.84      33.33      57.14 |      2.83 
              -----------+---------------------------------+----------
                   Total |       237          3          7 |       247 
                         |     95.95       1.21       2.83 |    100.00 
                         |    100.00     100.00     100.00 |    100.00 
              
              
              -------------------------------------------------------
                                       |       c1        c2        c3
              -------------------------+-----------------------------
                           Sensitivity |   0.9912    0.1538    0.5714
                           Specificity |   0.4000    0.9957    0.9875
               Postive predictive rate |   0.9494    0.6667    0.5714
              Negative predictive rate |   0.8000    0.9549    0.9875
              -------------------------+-----------------------------
                    False postive rate |   0.6000    0.0043    0.0125
                   False negative rate |   0.0088    0.8462    0.4286
              -------------------------+-----------------------------
                              Accuracy |   0.9433    0.9514    0.9757
              -------------------------------------------------------
              The required frequencies to calculate the conditional probabilities are returned in r(). r(p1) for example holds the true positive ratings for each of the three categories (or classes).

              Code:
              . return list
              
              matrices:
                                r(P) :  7 x 3
                               r(p0) :  3 x 1
                               r(n1) :  3 x 1
                               r(n0) :  3 x 1
                               r(p1) :  3 x 1
              
              . matrix list r(p1)
              
              r(p1)[3,1]
                   c1
              r1  225
              r2    2
              r3    4
              Best
              Daniel

              Comment


              • #8
                Thank you so much Clyde & Daniel for all of your help and guidance.

                Daniel, if I may how should I edit the codes you kindly provided on your Final post if my definitions are the following:

                - A Truely positive case is when the "Gold Standard" test results are "either: Week (2) or Paralysis (3).
                - A Truely negative case is when the "Gold Standard" test result is only: "Normal (1)"

                As by running the syntax you kindly provided for the "return" list:

                r(p1)[3,1] c1 r1 225* r2 2 r3 4 * it counted the (Normal) which are (225) as "True Positive", whcih they are not and they are considered as the "True Negative" [correspond to th (d) cell in case of a 2 by 2 table]
                So the Definitions in my data are:

                - Sensitivity: is the ability of the (new test) to detect the diseased patients (who are diagnosed with either a "week=2" or "paralysis=3)", when they are Truely having this diagnosis and are diseased according to the (Gold Standard) test

                - Specificity: is the ability of the (new test) to detect the Healthy patients(who are diagnosed with either a "Normal=1" ), when they are Truely Healthy and are not diseased (Normal) according to the (Gold Standard) test


                - and PPV & NPV are based on the same +ve and -ve case definitions

                Thank you so much for all of efforts, time and help . Your support are highly appreciated.

                Best Regards,


                Comment


                • #9
                  As being a beginner in Stata, I apologize in advance for my coming question, How can I install the "conftab" syntax to my stata please?

                  Thank you

                  Comment


                  • #10
                    Let me answer the last question first. You can download the files attached to this post and store them where Stata can find them. Usually this would be in the folder c:/ado/plus/c on a windows machine. But type in Stata

                    Code:
                    adopath
                    to find out where Stata looks for ado-files on your machine. Put both files into the respective "plus" folder. You can then use conftab just as any other command. Type

                    Code:
                    help conftab
                    to view the help file.

                    Concerning the questions about what is counted as true positive or negative, I think you should rather recode your (specific) variables to reflect your ideas than change the (general) code. If you do not want to distinguish between codes 2 and 3 then why keep them separate? From what you write it sound like you really want only two categories, "normal" or "not normal". Of course, you will then no longer need conftab as you can apply any tool that is designed to deal with only two outcomes (see end of the help file for two such commands).

                    Let me add that the same logic applies to the cells of the initial table. If you want the 225 to be in another cell, then enter your data in a different way rather than changing the code for the computations. Try

                    Code:
                    tabi conftabi 8 2 \ 12 225 , clear
                    conftab row col [fweight=pop]
                    which will give

                    Code:
                    . tabi 8 2 \ 12 225 , replace
                    
                               |          col
                           row |         1          2 |     Total
                    -----------+----------------------+----------
                             1 |         8          2 |        10 
                             2 |        12        225 |       237 
                    -----------+----------------------+----------
                         Total |        20        227 |       247 
                    
                               Fisher's exact =                 0.000
                       1-sided Fisher's exact =                 0.000
                    
                    . conftab row col [fweight=pop]
                    
                               |          col
                           row |         1          2 |     Total
                    -----------+----------------------+----------
                             1 |         8          2 |        10 
                               |     80.00      20.00 |    100.00 
                               |     40.00       0.88 |      4.05 
                    -----------+----------------------+----------
                             2 |        12        225 |       237 
                               |      5.06      94.94 |    100.00 
                               |     60.00      99.12 |     95.95 
                    -----------+----------------------+----------
                         Total |        20        227 |       247 
                               |      8.10      91.90 |    100.00 
                               |    100.00     100.00 |    100.00 
                    
                    
                    ------------------------------------
                                              |       c1
                    --------------------------+---------
                                  Sensitivity |   0.8000
                                  Specificity |   0.9494
                    Positive predictive value |   0.4000
                    Negative predictive value |   0.9912
                    --------------------------+---------
                          False positive rate |   0.0506
                          False negative rate |   0.2000
                    --------------------------+---------
                                     Accuracy |   0.9433
                    ------------------------------------
                    Best
                    Daniel
                    Attached Files
                    Last edited by daniel klein; 29 Jan 2017, 23:08.

                    Comment

                    Working...
                    X