Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • kappaetc available from SSC

    Thanks as always to Kit Baum, a new program kappaetc is now available from the SSC.

    The name is parsed "kappa, etc." and it is borrowed from the recently released entropyetc with kind permission of Nick Cox.

    Stata’s built-in capabilities for assessing interrater agreement are pretty much limited to two version of the kappa-statistic: Cohen’s (weighted) kappa for two raters and Fleiss’ kappa (which is actually a generalization of Scott’s Pi) for three or more raters. Despite the fact that these statistics have been criticized for a long time and alternatives have been developed, these measures appear to remain the ones most often used. One reason seems to be the limited availability of alternatives in statistical software.

    kappaetc is intended to fill this gap in Stata. The program implements methods and formulas discussed in Gwet (2014). It calculates six agreement coefficients along with their standard errors and confidence intervals for any number of raters and any number of categories. It handles varying numbers of raters per subject and provides seven prerecorded weights, suitable for any level of measurement.

    Here is an example that borrows from [R] kappa Example 9:

    Code:
    . webuse rvary2
    
    . kappaetc rater1-rater5
    
    Interrater agreement                              Number of subjects =      10
                                                        Number of raters =       5
    
                                                Ratings per subject: min =       3
                                                                     avg =     4.7
                                                                     max =       5
    
    -------------------------------------------------------------------------------
                         |   Coef.  Std. Err.     t    P>|t|   [95% Conf. Interval]
    ---------------------+---------------------------------------------------------
    Percent Agreement    |  0.5833    0.0759    7.69   0.000      0.4117    0.7550
    Brennan and Prediger |  0.3750    0.1138    3.29   0.009      0.1175    0.6325
    Cohen/Conger's Kappa |  0.3853    0.1049    3.67   0.005      0.1481    0.6225
    Fleiss' Kappa        |  0.3586    0.1207    2.97   0.016      0.0856    0.6316
    Gwet's AC            |  0.3829    0.1145    3.34   0.009      0.1238    0.6420
    Krippendorff's alpha |  0.3897    0.1462    2.66   0.026      0.0589    0.7205
    -------------------------------------------------------------------------------
    I hope some of you will find this useful.

    Best
    Daniel


    Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.
    Last edited by daniel klein; 20 Dec 2016, 02:59.

  • #2
    Naturally no-one needs my permission to use etc as part of a program name but it was most courteous of Daniel to ask.

    Comment


    • #3
      Thanks as always to Kit Baum, an updated version of kappaetc is now available from the SSC. In Stata use the adoupdate command or the ssc command to first install the program.

      The update fixes some bugs and enhances the capabilities of the software. This concerns the following points.
      • The estimated Cohen's and Conger's Kappa was incorrect when the number of raters varied across subjects or in the presence of missing ratings. This has been fixed.
      • Jackknife standard errors (and therefore unconditional standard errors) could be incorrect when missing ratings lead to a different number of subjects in the jackknife replications. This has been fixed. The number of subjects is now reset for every jackknife replication.
      • Extended missing values were erroneously treated as valid rating categories. They are now treated like system missing values.
      • Ratio weights were incorrect and produced wrong results when 0 was used as a rating category. This has been fixed and internal checks have been added to prevent invalid weighting matrices from being used.
      • The lower bound of the confidence interval is now truncated to -1 so the maximum range for the interval is [-1, 1].
      • A new method for benchmarking agreement coefficients has been implemented. See the help file for more information.
      • Much like Stata’s kappa command, kappaetc can now work with data where variables record rating frequencies.
      • The new immediate command kappaetci calculates interrater agreement for two raters from a contingency table.
      I hope some of you find this useful.

      Best
      Daniel

      Comment


      • #4
        Thanks as usual to Kit Baum, another update for the kappaetc command is available from the SSC archives.

        The update has the following fixes and enhancements:
        • A slightly revised formula for the variance of Krippendorff's alpha has been implemented. For reasonable sample sizes of subjects the differences should not be large.
        • Jackknife standard errors (and therefore unconditional standard errors) could be incorrect when one or more agreement coefficient could not be estimated in the jackknife replicates. In this case standard errors were based only on those replicates where all six coefficients could be estimated. Standard errors are now based on all available replicates for a specific coefficient. Nevertheless jackknife standard errors should be interpreted with caution when one or more coefficient could not be estimated.
        • wgt_option indices() is no longer allowed and will exit with an informative error message most of the time. The option was intended for situations in which not all possible rating categories were observed in the data. It could then be used to extract from a full weighting matrix the rows and columns that correspond to the observed ratings. This results in valid (weighted) agreement coefficients only when the expected proportion of agreement does not depend on the number of rating categories, e.g. Cohen' kappa. However, results are questionable otherwise (e.g. for Brennan and Prediger and Gwet's AC) even in the unweighted case.
        • kappaetci did not work correctly with the categories() option. The maximum rating was assumed to be the maximum number of rows and columns of the contingency table. It is now obtained from the values underlying the table.
        • Option df() was ignored. This has been fixed.
        • wgt_options kapwgt and matrix evoked an error despite legal syntax. This has been fixed.
        • Gwet's (2016) approach for testing the difference between correlated agreement coefficients has been implemented. This includes new options and additional returned results.
        • Option categories() for specifying the predetermined rating categories that was available for kappaetci is now available for kappaetc, too. It should be used if not all possible rating categories were observed in the data (see the help file for more). The option may also be used with the frequency option to specify rating categories that are different from the default integer sequence 1, 2, ..., q.
        • The output has been slightly modified.
        • The help file(s) has been revised and extended.
        Best
        Daniel


        Gwet, K. L. (2016). Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. Educational and Psychological Measurement, 76, 609-637.
        Last edited by daniel klein; 11 May 2017, 08:35.

        Comment


        • #5
          Thanks as always to Kit Baum, another update for the kappaetc command is available from the SSC archives. The update has the following fixes and enhancements:
          • The recently revised formula for the variance of Krippendorff's alpha produced incorrect results when missing ratings occurred. This has been fixed.
          • Option nsubjects() for finite sample correction did not work correctly in the previous update. This has been fixed.
          • Option ttest did not work with weighted data. It now does.
          • Alternative weights for circular data, suggested by Warrens and Pratiwi (2016), have been implemented.
          • A generalization of linear, quadratic and radical weights, discussed in Warrenes (2014), have been implemented.
          • New option testvalue() test whether agreement coefficients are equal to a specified value other than 0 (the default). One-sided tests are supported.
          Best
          Daniel

          Warrens, M. J. (2014). Power Weighted Versions of Bennett, Alpert, and Goldstein's S. Journal of Mathematics, 1-9.
          Warrens, M. J., Pratiwi, B. C. (2016). Kappa Coefficients for Circular Classifications. Journal of Classification, 33, 507-522.

          Comment


          • #6
            Thanks as usual to Kit Baum, another updated version of kappaetc is now available from SSC.

            The update includes the following:
            • A couple of bugs have been fixed. This mostly concerns rare situations in which kappaetc issued an error message when it should not have (e.g., when an if condition had parentheses) or did not issue an error message when it should have (e.g., when option se() was specified with Krippendorff's ordinal weights).
            • The syntax for some of the options has changed; old syntax continues to work.
            • The output has been slightly modified (e.g., new coefficient names; right-aligned).
            • The help files have been revised and extended.
            • New option nociclipped returns and reports confidence intervals as estimated, i.e., confidence intervals are not clipped to fall into the range [-1, 1]. When the option is not specified, kappaetc now issues a note below the coefficient table that the CIs have been clipped.
            • New command kappaetcssi estimates approximate sample size for agreement coefficients using a rule-of-thumb like method, described in Gwet (2014, p. 159).
            • New option loa produces basic Bland-Altman (1986) plots for two raters/methods.
            • New option icc() estimates intraclass correlation coefficients (for repeated measurements per subject and rater) as a measure of inter- and intrarater reliability.
            Best
            Daniel

            Bland, M. J., Altman, D. G. (1986). Statistical Methods for Assessing Agreement betweentwo Methods of Clinical Measurment. Lancet, 1, 307-310.
            Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.

            Comment


            • #7
              Dear Daniel,
              I'm a new Stata user and trying to use kappaetc command to calculate inter-rater reliability of few nominal, ordinal and ratio scale variables. I used three commands to calculate the IRR of ordinal variable. First I used a command without weights and then two commands with linear and quadratic weights. I wonder what is the output without weight? and which one to use in my case. I would much appreciate if you could help me to differentiate three outputs.

              Code:
              . kappaetc Q_116_Financial_Status_N Q_116_Recode if RA_Code==1, listwise
              
              Interrater agreement                             Number of subjects =      14
                                                              Ratings per subject =       2
                                                      Number of rating categories =       3
              ------------------------------------------------------------------------------
                                   |   Coef.   Std. Err.   t    P>|t|   [95% Conf. Interval]
              ---------------------+--------------------------------------------------------
                 Percent Agreement |  0.6429    0.1329   4.84   0.000     0.3558     0.9300
              Brennan and Prediger |  0.4643    0.1993   2.33   0.037     0.0336     0.8949
              Cohen/Conger's Kappa |  0.4531    0.2024   2.24   0.043     0.0158     0.8905
                  Scott/Fleiss' Pi |  0.4466    0.2091   2.14   0.052    -0.0051     0.8984
                         Gwet's AC |  0.4727    0.1963   2.41   0.032     0.0486     0.8968
              Krippendorff's Alpha |  0.4664    0.2091   2.23   0.044     0.0146     0.9182
              ------------------------------------------------------------------------------

              Code:
              kappaetc Q_116_Financial_Status_N Q_116_Recode if RA_Code==1, wgt(linear) listwise
              
              Interrater agreement                             Number of subjects =      14
              (weighted analysis)                             Ratings per subject =       2
                                                      Number of rating categories =       3
              ------------------------------------------------------------------------------
                                   |   Coef.   Std. Err.   t    P>|t|   [95% Conf. Interval]
              ---------------------+--------------------------------------------------------
                 Percent Agreement |  0.8214    0.0664  12.36   0.000     0.6779     0.9650
              Brennan and Prediger |  0.5982    0.1495   4.00   0.002     0.2752     0.9212
              Cohen/Conger's Kappa |  0.5679    0.1703   3.34   0.005     0.2000     0.9358
                  Scott/Fleiss' Pi |  0.5611    0.1775   3.16   0.008     0.1777     0.9446
                         Gwet's AC |  0.6136    0.1452   4.23   0.001     0.2999     0.9274
              Krippendorff's Alpha |  0.5768    0.1775   3.25   0.006     0.1934     0.9603
              ------------------------------------------------------------------------------

              Code:
              . kappaetc Q_116_Financial_Status_N Q_116_Recode if RA_Code==1, wgt(quadratic) listwise
              
              Interrater agreement                             Number of subjects =      14
              (weighted analysis)                             Ratings per subject =       2
                                                      Number of rating categories =       3
              ------------------------------------------------------------------------------
                                   |   Coef.   Std. Err.   t    P>|t|   [95% Conf. Interval]
              ---------------------+--------------------------------------------------------
                 Percent Agreement |  0.9107    0.0332  27.41   0.000     0.8389     0.9825
              Brennan and Prediger |  0.7321    0.0997   7.35   0.000     0.5168     0.9475
              Cohen/Conger's Kappa |  0.6957    0.1320   5.27   0.000     0.4104     0.9809
                  Scott/Fleiss' Pi |  0.6896    0.1388   4.97   0.000     0.3896     0.9895
                         Gwet's AC |  0.7482    0.0964   7.76   0.000     0.5399     0.9565
              Krippendorff's Alpha |  0.7007    0.1388   5.05   0.000     0.4007     1.0000
              ------------------------------------------------------------------------------
              Confidence intervals are clipped at the upper limit.
              Thank you.

              Deepani

              Comment


              • #8
                Deepani

                Thanks for your interest in kappaetc. I am not quite sure I fully understand your problem, though.

                The first output shows results from the unweighted analysis. In the unweighted analysis, you differentiate between (full) agreement and no agreement; the two raters (observers, judges, methods, ...) agree if they assign the exact same category (out of the three possible categories) to a given subject (unit, case, ...); they disagree otherwise. This approach is usually appropriate for categories on the nominal scale. Think of colors, for example, and suppose we have 1 = red, 2 = yellow, and 3 = green. You could argue that the difference between 1 and 2 implies the same degree of (dis)agreement as the difference between 1 and 3. After all, red is no more different from yellow than it is from green.

                Now suppose, we are talking about traffic lights. Here, we might argue that red differs more from green that it does from yellow, which implies a certain order of the colors. We could also think from the perspective of a persons suffering from red-green color blindness, in which case red would be very similar to green and both would be different from yellow. Either way, whenever differences between categories imply differences in the degree of (dis)agreement, you want your agreement coefficient to reflect these differences. The way you do this is by assigning a set of weights to the differences between categories. The closer the weights are to 1, the less serious the disagreement/difference between the respective categories.

                I cannot tell you which weights to apply; that is a substantial question you need to answer, given your data and research question. Different sources recommend different weights and the suggestions from Gwet (2014) and Krippendorff (2013) are summarized in

                Code:
                help kappaetc choosing
                You can display the weighting matrix that is used with option showweights.

                Best
                Daniel

                Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.
                Krippendorff, K. (2013). Computing Krippendorff's Alpha-Reliability. (2011.1.25, Literature updated 2013.9.13)

                Comment


                • #9
                  Dear Daniel,
                  Thank you so much for clear prompt explanation. It has answered my question.

                  Best regards,
                  Deepani

                  Comment


                  • #10
                    Dear Daniel,
                    A quick query is kappaetc, icc comand calculate the absolute agreement? Is there any way to calculate the consistency. I went through the help file but couldn't find the answer.

                    Thank you.

                    Best regards,
                    Deepani

                    Comment


                    • #11
                      The literature that I have used does not really go into the distinction between absolute agreement and consistency. However, since both estimators are implemented, you may obtain results for both types of agreement.

                      For the random-effects model, ICC(2), kappaetc reports absolute agreement, while for mixed-effects models, ICC(3), consistency is reported; this is just like Stata's icc command. Now the estimate for the mixed-effects model equals the consistency that would be obtained for the random-effects model. Likewise, the estimate for the random-effects model equals the absolute agreement that would be obtained for the mixed-effects model. All this is to say that the estimators are the same; it is just the interpretation that differs. Consult the pdf documentation for Stata's icc command to confirm the claim and for more details.

                      Best
                      Daniel
                      Last edited by daniel klein; 18 Apr 2018, 11:57.

                      Comment


                      • #12
                        Dear Daniel,
                        Thanks for the explanation. I truly appreciate your input. I'm bit confused though. Does it mean if I need the absolute agreement for mixed-effects models, ICC(3), do I need to run the random effects models, ICC(2)? I went through the Stata icc command but there we can specify whether we need absolute agreement or consistency.

                        Thanks.

                        Best regards,
                        Deepani

                        Comment


                        • #13
                          if I need the absolute agreement for mixed-effects models, ICC(3), do I need to run the random effects models, ICC(2)?
                          Yes.

                          I went through the Stata icc command but there we can specify whether we need absolute agreement or consistency.
                          True, but

                          Code:
                          icc rating target judge , mixed absolute
                          gives the exact same result as

                          Code:
                          icc rating target judge
                          Best
                          Daniel

                          Comment


                          • #14
                            Cheers! Thank you so much.

                            Best regards,
                            Deepani

                            Comment


                            • #15
                              Dear Daniel,
                              Just a quick question again. I started to compute agreement coefficients with kappaetc programme. There are some occasions that I'm getting this message 'ratings do not vary' and Stata does not compute the agreement coefficients. How should I report my results in this case? Is it okay to consider all the agreement coefficients as 1?

                              Much appreciate your input. I look forward to hearing from you.

                              Thank you.
                              BW
                              Deepani

                              Comment

                              Working...
                              X