Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "relative improvement over chance" - modified Kappa

    just wondering if anyone has, or knows of, a program to calculate the "relative improvement over chance" version of Kappa; it is not in official Stata or in -kappaetc-; my immediate situation is an N of about 100, 2 raters and 2 binary ratings per subject; here are some citations:

    Loeber, R and Dishion, T (1983), "Early predictors of male delinquency", Psychological Bulletin, 94: 68-99

    Cairney, J and Streiner, DL (2011), "Using relative improvement over chance (RIOC) to examine agreement between tests", Research in Developmental Disabilities, 32: 87-92

    Streiner, DL, Norman, GR, Cairney, J (2015), Health Measurement Scales, fifth edition, Oxford U press; discussed on pp. 177-8 (with formula)

  • #2
    I am not aware of any program to calculate this.

    Skimming through Farrington and Loeber (1989), the underlying formulas appear trivial. The authors note/show that RIOC is equivalent to a corrected Phi coefficient. Rich Goldstein has implemented the Phi coefficient some time ago (STB-3). Perhaps extending this code is the quickest way of getting what you want.

    I am sure that Rich knows exactly what he wants to do. For others, I would like to point out that the motivation that Farrington and Loeber (1989) give for RIOC (over Kappa but also in general) is based on a situation where you know the "true" values/categories of the subjects that are rated. In this situation, the concepts of "false positives" and "false negatives" are meaningful, and it might indeed be justified to take the marginals as fixed (as Cohen's Kappa does), and then calculate the "maximum possible correct" from those fixed marginals. In my view, it is not clear at all whether the RIOC (or Kappa) is justified in a setting where the "true" value is unknown (as is often the case in interrater agreement studies) and, hence, the "maximum possible correct" cannot be known either. More generally (but probably off-topic), these points relate to questions of whether and in which situation "chance" should be defined as "statistical independence" and whether raters (and, therefore, marginals) should be treated as fixed.


    Farrington, D. P., Loeber, R. 1989. Relative Improvement Over Chance (RIOC) and Phi as Measures of Predictive Efficiency and Strength of Association in 2 x 2 Tables. Journal of Quantitative Criminology, 5(3): pp. 201--213.

    Comment


    • #3
      daniel klein thank you very much; I have calculated the values I need by hand for both the measure itself and for its variance (formula in Cairney & Streiner (cited in #1); I (1) did not realize that RIOC was equivalent to corrected Phi (thanks for the cite also) and (2) had completely forgotten about that old program!

      Comment


      • #4
        forgot to include thanks to Mike Lacy who supplied me with a Stata translation of an old program of his for rioc that was originally written in Pascal

        Comment


        • #5
          Would it be feasible to provide the rioc code as a package to the community of Stata users (ssc install)?
          http://publicationslist.org/eric.melse

          Comment


          • #6
            if you are asking me, then I note that I did the calculations "by hand" (i.e., using the -display- command) and this is a long way from a program; further, this issue is rare in my work and I do not intend to write such a program;

            if you request is aimed at Mike Lacy , then I leave it for him to respond

            Comment


            • #7
              If I may add my thoughts to Rich Goldstein's comments: If the calculations are so trivial that they can easily be done interactively in a single display command, it is probably not worth the effort to implement the necessary overhead of syntax parsing, error checking, etc. in a program and write a help file. Of course, if you just enjoy programming and/or want to learn to program in Stata, this might be a nice little project to start with. From the stuff that I am aware of, this could fit well as an addition to the diagt package (SJ 4-4, SSC) by Paul Seed.

              Comment


              • #8
                The code I gave Rich Goldstein is barely one step up from a hand calculator, and I think daniel klein is right about the overhead not being worth it as a standalone item. All that said, my code snippet appears below.
                Code:
                cap prog drop rioc
                prog rioc, rclass
                // Loeber, R and Dishion, T (1983), "Early predictors of male delinquency",
                // Psychological Bulletin, 94: 68-99
                di _newline "Enter 0/1 variables, predicted then actual"
                syntax varlist
                local predicted = word("`varlist'", 1)
                local actual = word("`varlist'", 2)
                tempname f
                tab `predicted' `actual', matcell(`f')
                // mat list `f'
                local truepos =  `f'[2,2]
                local falsepos = `f'[1,2]
                local trueneg = `f'[1,1]
                local falseneg = `f'[2,1]
                local N = `truepos' + `falsepos' + `trueneg' + `falseneg'
                local ObsvCorrect = `truepos' + `trueneg'
                // figure number correct by chance, i.e. expected frequencies of
                //  truepos and trueneg under a model of independence
                local ChanceTruePos = (`truepos' + `falsepos') * (`truepos' + `falseneg')/`N'
                local ChanceTrueNeg = (`trueneg' + `falseneg') * (`falsepos' + `trueneg')/`N'    
                local ChanceCorrect = `ChanceTruePos' + `ChanceTrueNeg'
                // Figure max trueneg and truepos, based on all positives being true,
                //  other cells constrained by marginals
                local MaxTruePos = `truepos' + `falseneg'
                local MaxTrueNeg = `falseneg' + `trueneg'
                local MaxCorrect = `MaxTruePos' + `MaxTrueNeg'
                local RIOC = ( (`ObsvCorrect'/`N') - (`ChanceCorrect')/`N') /    ///
                             (`MaxCorrect'/`N' - `ChanceCorrect'/`N')
                di "Observed % Correct = " 100* `ObsvCorrect'/`N'
                di "Chance % Correct = " 100* `ChanceCorrect'/`N'
                di "Max % Correct = " 100* `MaxCorrect'/`N'
                di "RIOC = " 100* `RIOC'
                return scalar rioc = `RIOC'
                end
                //
                // example
                clear
                sysuse auto
                logit foreign length
                predict p, pr
                local break = 0.6
                gen byte predfor = p > `break'
                rioc predfor foreign

                Comment


                • #9
                  Most helpful Mike, thank you!
                  http://publicationslist.org/eric.melse

                  Comment


                  • #10
                    Originally posted by daniel klein View Post
                    if you just enjoy programming and/or want to learn to program in Stata, this might be a nice little project
                    I have decided that both reasons apply to me so I gave this a shot. I have implemented RIOC according to the formulas in Copas and Loeber (1990) and Farrington and Loeber (1989). In doing so, I have noticed that Mike's code in #8 appears to depend on the order in which the variables are specified when it should only depend on the variables' values. The order in which the variables are specified, determine the table cells into which the "false positive" and the "false negative" (which, I think, Mike has the wrong way around) fall. While, mathematically, the orientation of the table should be irrelevant, Copas and Loeber (1990) (and, I assume, Loeber and Dishton [1983], to which Mike refers) chose to present their formulas under the assumption the row-total for the positives is larger or equal to the column-total of the positives; "if not, the rows and columns are interchanged" (Copas and Loeber, 1990:303). Mike's code produces wrong results if the row and column totals are not ordered in the way in which the formulas require them. Here is Mike's example, slightly modified so that a positive outcome is predicted at the cutoff of .4 (instead of .6):

                    Code:
                    . sysuse auto
                    (1978 Automobile Data)
                    
                    . logit foreign length
                    (output omitted)
                    
                    . predict p, pr
                    
                    . local break = 0.4
                    
                    . gen byte predfor = p > `break'
                    
                    . rioc predfor foreign
                    
                    Enter 0/1 variables, predicted then actual
                    
                               |       Car type
                       predfor |  Domestic    Foreign |     Total
                    -----------+----------------------+----------
                             0 |        42          5 |        47
                             1 |        10         17 |        27
                    -----------+----------------------+----------
                         Total |        52         22 |        74
                    Observed % Correct = 79.72973
                    Chance % Correct = 55.478451
                    Max % Correct = 106.75676
                    RIOC = 47.293447
                    Note that Max. % Correct exceeds 100%, which is not plausible. We get the correct result if we interchange predfor and foreign; but that would yield wrong results in the original example with a cutoff of .6.


                    Below, you find my first draft for estimating RIOC, its standard error, and confidence intervals (and some overhead). The code does not yet work for zero cell frequencies and might benefit from subroutines (and additional options); an immediate form of the command is still missing and there is not yet a help file. Anyway, here are the results for Mike's example with cutoff values of .6 and .4, respectively:

                    Code:
                    . rioc predfor foreign
                    
                    Relative improvement over chance                      Number of obs. =      74
                    ------------------------------------------------------------------------------
                                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                            RIOC |    .525641   .1597467     3.29   0.001     .2125432    .8387389
                    ------------------------------------------------------------------------------
                    
                    (output omitted)
                    
                    . rioc predfor foreign
                    
                    Relative improvement over chance                      Number of obs. =      74
                    ------------------------------------------------------------------------------
                                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                            RIOC |   .6421663   .1287339     4.99   0.000     .3898526    .8944801
                    ------------------------------------------------------------------------------

                    More results from the tables (1, 2, and 3) from Cairney and Streiner (2011):

                    Code:
                    . // table 1
                    . rioc prediction outcome , tab
                    
                               |        outcome
                    prediction |      True      False |     Total
                    -----------+----------------------+----------
                          True |        22         27 |        49
                         False |        54        219 |       273
                    -----------+----------------------+----------
                         Total |        76        246 |       322
                    
                    Relative improvement over chance                      Number of obs. =     322
                    ------------------------------------------------------------------------------
                                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                            RIOC |   .2787456    .085151     3.27   0.001     .1118527    .4456386
                    ------------------------------------------------------------------------------
                    
                    . // table 2
                    . rioc prediction outcome , tab
                    
                               |        outcome
                    prediction |      True      False |     Total
                    -----------+----------------------+----------
                          True |        65         22 |        87
                         False |         5          5 |        10
                    -----------+----------------------+----------
                         Total |        70         27 |        97
                    
                    Relative improvement over chance                      Number of obs. =      97
                    ------------------------------------------------------------------------------
                                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                            RIOC |   .3071429   .2074997     1.48   0.139    -.0995491    .7138348
                    ------------------------------------------------------------------------------
                    
                    . // table 3
                    . rioc prediction outcome , tab
                    
                               |        outcome
                    prediction |      True      False |     Total
                    -----------+----------------------+----------
                          True |        39        245 |       284
                         False |         4        276 |       280
                    -----------+----------------------+----------
                         Total |        43        521 |       564
                    
                    Relative improvement over chance                      Number of obs. =     564
                    ------------------------------------------------------------------------------
                                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                    -------------+----------------------------------------------------------------
                            RIOC |   .8126246   .0882982     9.20   0.000     .6395633    .9856859
                    ------------------------------------------------------------------------------

                    And, finally, here is the program code (choosing the same name, rioc, as Mike has done):

                    Code:
                    *! version 0.0.9 23jan2021 daniel klein
                    program rioc , rclass
                        version 11.2
                        
                        syntax varlist(numeric min=2 max=2) ///
                        [ if ] [ in ] [ fweight ]           ///
                        [ ,                                 ///
                            SMALLsample                     ///
                            ASR /* synonym */ CHI2          ///
                            Level(cilevel)                  ///
                            Tab                             ///
                        ]
                        
                        marksample touse
                        
                        /*
                            the notation below follows Copas and Loeber (1990)
                            
                            we create 0/1 indicator variables from caller's varlist
                            note: we flip the coding so that zero indicates 'true'
                                  and nonzero and nonmissing indicates 'false' to
                                  create the same table as Copas and Loeber (1990)
                        */
                        
                        tempvar  prediction outcome
                        foreach varname in `prediction' `outcome' {
                            gettoken `varname' varlist : varlist
                            quietly generate `varname' = (``varname'' == 0) if `touse'
                            if ( mi("`tab'") ) continue
                            local varlabel : variable label ``varname''
                            if ( mi(`"`varlabel'"') ) local varlabel ``varname''
                            label variable `varname' `"`varlabel'"'
                        }    
                        
                        if ("`tab'" == "tab") {
                            tempname valuelabel
                            label define `valuelabel' 0 "True" 1 "False"
                            label values `prediction' `outcome' `valuelabel'
                        }
                        else local quietly quietly
                        
                        
                        tempname F
                        `quietly' tabulate `prediction' `outcome' ///
                            if `touse' [`weight' `exp'] , matcell(`F')
                        
                        if ( (r(r)!=2) | (r(c)!=2) ) {
                            display as err "cell frequency may not be zero"
                            exit 459
                        }
                        
                            /*
                                Copas and Loeber (1990) present the forumas assuming
                                that e >= f; we interchange rows and columns if necessary
                            */
                        tempname a b c d e f n
                        scalar `a' = `F'[1, 1]
                        scalar `b' = max(`F'[1, 2], `F'[2, 1])
                        scalar `c' = min(`F'[2, 1], `F'[1, 2])
                        scalar `d' = `F'[2, 2]
                        
                        scalar `e' = `a' + `b'
                        scalar `f' = `a' + `c'
                        scalar `n' = r(N)
                        
                        
                        tempname rioc crit sr srstar ll ul z p
                        
                            // Copas and Loeber (1990) (4)
                        scalar `rioc' = (`n'*`a' - `e'*`f') / ( `f'*(`n'-`e') )
                        
                        scalar `crit' = -invnormal((1-`level'/100)/2)
                        
                        if ( mi("`smallsample'") ) {
                                // Copas and Loeber (1990) (11)
                            scalar `sr' = sqrt( `n'*`c'*( `n'*`f'*(`n'-`e')            ///
                                        + `c'*(`n'*`e' + `e'*`f' -  2*`n'*`f' - `n'^2) ///
                                        + 2*`n'*`c'^2 ) / ( (`n'-`e')^3*`f'^3 ) )
                            
                            scalar `ll' = `rioc' - `crit'*`sr'
                            scalar `ul' = `rioc' + `crit'*`sr'
                        }    
                        else {
                            tempname alpha beta delta
                            
                            scalar `alpha' = `e'/`n'
                            scalar `beta'  = `f'/`n'
                            
                                // Copas and Loeber (1990) (20)
                            scalar `delta'  =  ln( ((`a'+.5)*(`d'+.5)) / ((`b'+.5)*(`c'+.5)) )
                            
                                // Copas and Loeber (1990) (21)
                            scalar `sr' = sqrt( ((`e'+1)*(`e'+2)) / (`e'*(`a'+1)*(`b'+1)) ///
                                        + ((`n'+1-`e')*(`n'+2-`e')) / ((`n'-`e')*(`c'+1)*(`d'+1)) )
                            
                            scalar `ll' = exp(`delta' - `crit'*`sr')
                            scalar `ul' = exp(`delta' + `crit'*`sr')
                            
                            foreach phi in ll ul {
                                scalar ``phi'' = ( 1+(``phi''-1)*(`alpha'+`beta'-2*`alpha'*`beta') ///
                                               - sqrt( (1+(`alpha'+`beta')*(``phi''-1))^2          ///
                                                      - 4*`alpha'*`beta'*``phi''*(``phi''-1) ) )   ///
                                               / ( 2*(``phi''-1)*`beta'*(1-`alpha') )
                            }
                            scalar `sr' = .
                        }
                        
                        if (("`asr'"=="asr") | ("`chi2'"=="chi2")) {
                                // Copas and Loeber (1990) (13)
                            scalar `srstar' = sqrt( (`e'*(`n'-`f')) / (`n'*`f'*(`n'-`e')) )
                        }
                        else scalar `srstar' = `sr'
                        
                        scalar `z' = `rioc'/`srstar'
                        scalar `p' = 2*normal(-abs(`z'))
                        
                        
                        tempname table
                        matrix `table' = `rioc'\ `srstar'\ `z'\ `p'\ `ll'\ `ul'\ `crit'\ .\ 0
                        matrix rownames `table' = b se z pvalue ll ul crit df eform
                        matrix colnames `table' = RIOC // Copas and Loeber (1990) call it R
                        
                        
                        local cf %9.0g
                        local pf %5.3f
                        local sf %8.2f    
                        
                        local cspec & %12s | ///
                            w10 `cf' & w9 `cf' o0& w8 `sf' & w6 `pf' & w11 `cf' & w10 `cf' &
                        
                        display as txt _newline "Relative improvement over chance" ///
                                           %36s "Number of obs." " = " as res %7.0g `n'
                        
                        display as txt "{hline 13}{c TT}{hline 64}"
                        if ( ("`asr'"=="asr") | ("`chi2'"=="chi2") ) ///
                        display as txt _col(14) "{c |}" _col(32) "ASR"
                        display as txt _col(14) "{c |}"     ///
                                       _col(21) "Coef."     ///
                                       _col(29) "Std. Err." ///
                                       _col(44) "z"         ///
                                       _col(49) "P>|z|"     ///
                                       _col(`= 61-strlen("`level'")') "[`level'% Conf. Interval]"
                        display as txt "{hline 13}{c +}{hline 64}" _continue
                        matlist `table'[1..6, 1]' , cspec(`cspec') rspec(&&) names(row)
                        display as txt "{hline 13}{c BT}{hline 64}"
                        
                        return scalar rioc  = `rioc'
                        return matrix table = `table'
                    end
                    exit


                    Cairney, J., & Streiner, D. L. 2011. Using relative improvement over chance (RIOC) to examine agreement between tests: Three case examples using studies of developmental coordination disorder (DCD) in children. Research in Developmental Disabilities, 32, 87--92

                    Copas, J. B., & Loeber, R. 1990. Relative improvement over chance (RIOC) for 2 x 2 tables. British Journal of Mathematical and Statistical Psychology, 43, 293--307.

                    Farrington, D.P., & Loeber, R. 1989. Relative improvement over chance (RIOC) and phi as measures of predictive efficiency and strength of association in 2 x 2 tables. Journal of Quantitative Criminology, 5, 201--213.
                    Last edited by daniel klein; 23 Jan 2021, 13:42. Reason: spelling, formatting

                    Comment


                    • #11
                      Thanks for fixing this, daniel klein I haven't looked at the original article since 1998 (and what I posted was a mechanical translation of my old Pascal code.) I wish I could say that I'm sure that I would not have made the same mistakes had I re-read the article and written the code from scratch. :-}

                      Comment


                      • #12
                        Originally posted by Mike Lacy View Post
                        I wish I could say that I'm sure that I would not have made the same mistakes had I re-read the article and written the code from scratch. :-}
                        Well, I would blame the authors. Seriously, this should be really more explicit in the articles.

                        Comment


                        • #13
                          Thanks to Kit Baum, a revised version of the program code in #10 is now available as rioc from the SSC. There is an accompanying immediate command, rioci, and a help file, which documents the details.

                          Here is a demonstration of the added features.

                          We will start by replicating the last example in #10 with the immediate command rioci

                          Code:
                          . rioci 39 245 \ 4 276
                          
                                     |        Outcome
                          Prediction |      True      False |     Total
                          -----------+----------------------+----------
                            True (+) |        39        245 |       284
                           False (-) |         4        276 |       280
                          -----------+----------------------+----------
                               Total |        43        521 |       564
                          
                          Relative improvement over chance                    Number of obs. =       564
                          ------------------------------------------------------------------------------
                                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                  RIOC |   .8126246   .0882982     9.20   0.000     .6395633    .9856859
                          ------------------------------------------------------------------------------

                          Let us replace the data in memory with the example data (also see tabi)

                          Code:
                          . rioci 39 245 \ 4 276 , replace
                          (output omitted)

                          and report additional details

                          Code:
                          . rioc prediction outcome [fweight = pop] , detail
                          
                          Relative improvement over chance                    Number of obs. =       564
                          ------------------------------------------------------------------------------
                                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                       |                                                                
                                  RIOC |   .8126246   .0882982     9.20   0.000     .6395633    .9856859
                          -------------+----------------------------------------------------------------
                          Correct      |                                                                
                                 Total |   .5585106   .0209091                      .5164267    .5999786
                                Chance |   .4969946   .0008865                      .4952556    .4987337
                               Maximum |    .572695   .0208301                      .5306906    .6139341
                          ------------------------------------------------------------------------------

                          Now, we add some common statistics; we add sensitivity (the true positive rate) and specificity (the true negative rate). Note that these require that the variables are specified in the correct order.

                          Code:
                          . rioc prediction outcome [fweight = pop] , detail stats(TPR TNR)
                          
                          Relative improvement over chance                    Number of obs. =       564
                          ------------------------------------------------------------------------------
                                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                       |                                                                
                                  RIOC |   .8126246   .0882982     9.20   0.000     .6395633    .9856859
                          -------------+----------------------------------------------------------------
                          Correct      |                                                                
                                 Total |   .5585106   .0209091                      .5164267    .5999786
                                Chance |   .4969946   .0008865                      .4952556    .4987337
                               Maximum |    .572695   .0208301                      .5306906    .6139341
                                       |                                                                
                                   TPR |   .9069767   .0442955                      .7786466    .9740687
                                   TNR |   .5297505   .0218666                       .485868    .5732937
                          ------------------------------------------------------------------------------

                          We can also add our own statistics. Let us add prevalence, which we define as the proportion of positive cases.

                          Code:
                          . rioc prediction outcome [fweight=pop] , stats(Prevalence: (TP+FN)/N)
                          
                          Relative improvement over chance                    Number of obs. =       564
                          ------------------------------------------------------------------------------
                                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                  RIOC |   .8126246   .0882982     9.20   0.000     .6395633    .9856859
                          -------------+----------------------------------------------------------------
                            Prevalence |   .0762411   .0111747                      .0557228    .1013245
                          ------------------------------------------------------------------------------
                          Last edited by daniel klein; 27 Jan 2021, 02:48.

                          Comment


                          • #14
                            daniel klein thank you!

                            Comment


                            • #15
                              Very nice, Daniel. Thanks.

                              Regarding confidence intervals, the help file says this:

                              stats(stat) requests additional proportions. Confidence intervals are calculated using Stata's cii;
                              It appears that -cii- is called without any options, and so gives the default exact binomial CI. Is it possible to pass one of the other options to cii? E.g., I often like to use the wilson option. No big deal if not, but I thought I should ask. ;-)

                              Cheers,
                              Bruce

                              --
                              Bruce Weaver
                              Email: [email protected]
                              Version: Stata/MP 18.5 (Windows)

                              Comment

                              Working...
                              X