kappaetc available from SSC

daniel klein

Join Date: Mar 2014

Posts: 3859
#1

kappaetc available from SSC

20 Dec 2016, 02:56

Thanks as always to Kit Baum, a new program kappaetc is now available from the SSC.

The name is parsed "kappa, etc." and it is borrowed from the recently released entropyetc with kind permission of Nick Cox.

Stata’s built-in capabilities for assessing interrater agreement are pretty much limited to two version of the kappa-statistic: Cohen’s (weighted) kappa for two raters and Fleiss’ kappa (which is actually a generalization of Scott’s Pi) for three or more raters. Despite the fact that these statistics have been criticized for a long time and alternatives have been developed, these measures appear to remain the ones most often used. One reason seems to be the limited availability of alternatives in statistical software.

kappaetc is intended to fill this gap in Stata. The program implements methods and formulas discussed in Gwet (2014). It calculates six agreement coefficients along with their standard errors and confidence intervals for any number of raters and any number of categories. It handles varying numbers of raters per subject and provides seven prerecorded weights, suitable for any level of measurement.

Here is an example that borrows from [R] kappa Example 9:

Code:

. webuse rvary2 . kappaetc rater1-rater5 Interrater agreement Number of subjects = 10 Number of raters = 5 Ratings per subject: min = 3 avg = 4.7 max = 5 ------------------------------------------------------------------------------- | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------------+--------------------------------------------------------- Percent Agreement | 0.5833 0.0759 7.69 0.000 0.4117 0.7550 Brennan and Prediger | 0.3750 0.1138 3.29 0.009 0.1175 0.6325 Cohen/Conger's Kappa | 0.3853 0.1049 3.67 0.005 0.1481 0.6225 Fleiss' Kappa | 0.3586 0.1207 2.97 0.016 0.0856 0.6316 Gwet's AC | 0.3829 0.1145 3.34 0.009 0.1238 0.6420 Krippendorff's alpha | 0.3897 0.1462 2.66 0.026 0.0589 0.7205 -------------------------------------------------------------------------------

I hope some of you will find this useful.

Best
Daniel

Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.

Last edited by daniel klein; 20 Dec 2016, 02:59.
Tags: gwet, interrater agreement, kappa, Krippendorff, ssc

3 likes
Nick Cox

Join Date: Mar 2014

Posts: 35709
#2

20 Dec 2016, 02:59

Naturally no-one needs my permission to use etc as part of a program name but it was most courteous of Daniel to ask.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#3

18 Jan 2017, 03:23

Thanks as always to Kit Baum, an updated version of kappaetc is now available from the SSC. In Stata use the adoupdate command or the ssc command to first install the program.

The update fixes some bugs and enhances the capabilities of the software. This concerns the following points.
The estimated Cohen's and Conger's Kappa was incorrect when the number of raters varied across subjects or in the presence of missing ratings. This has been fixed.

Jackknife standard errors (and therefore unconditional standard errors) could be incorrect when missing ratings lead to a different number of subjects in the jackknife replications. This has been fixed. The number of subjects is now reset for every jackknife replication.

Extended missing values were erroneously treated as valid rating categories. They are now treated like system missing values.

Ratio weights were incorrect and produced wrong results when 0 was used as a rating category. This has been fixed and internal checks have been added to prevent invalid weighting matrices from being used.

The lower bound of the confidence interval is now truncated to -1 so the maximum range for the interval is [-1, 1].

A new method for benchmarking agreement coefficients has been implemented. See the help file for more information.

Much like Stata’s kappa command, kappaetc can now work with data where variables record rating frequencies.

The new immediate command kappaetci calculates interrater agreement for two raters from a contingency table.

I hope some of you find this useful.

Best
Daniel
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#4

11 May 2017, 08:30

Thanks as usual to Kit Baum, another update for the kappaetc command is available from the SSC archives.

The update has the following fixes and enhancements:
A slightly revised formula for the variance of Krippendorff's alpha has been implemented. For reasonable sample sizes of subjects the differences should not be large.

Jackknife standard errors (and therefore unconditional standard errors) could be incorrect when one or more agreement coefficient could not be estimated in the jackknife replicates. In this case standard errors were based only on those replicates where all six coefficients could be estimated. Standard errors are now based on all available replicates for a specific coefficient. Nevertheless jackknife standard errors should be interpreted with caution when one or more coefficient could not be estimated.

wgt_option indices() is no longer allowed and will exit with an informative error message most of the time. The option was intended for situations in which not all possible rating categories were observed in the data. It could then be used to extract from a full weighting matrix the rows and columns that correspond to the observed ratings. This results in valid (weighted) agreement coefficients only when the expected proportion of agreement does not depend on the number of rating categories, e.g. Cohen' kappa. However, results are questionable otherwise (e.g. for Brennan and Prediger and Gwet's AC) even in the unweighted case.

kappaetci did not work correctly with the categories() option. The maximum rating was assumed to be the maximum number of rows and columns of the contingency table. It is now obtained from the values underlying the table.

Option df() was ignored. This has been fixed.

wgt_options kapwgt and matrix evoked an error despite legal syntax. This has been fixed.

Gwet's (2016) approach for testing the difference between correlated agreement coefficients has been implemented. This includes new options and additional returned results.

Option categories() for specifying the predetermined rating categories that was available for kappaetci is now available for kappaetc, too. It should be used if not all possible rating categories were observed in the data (see the help file for more). The option may also be used with the frequency option to specify rating categories that are different from the default integer sequence 1, 2, ..., q.

The output has been slightly modified.

The help file(s) has been revised and extended.

Best
Daniel

Gwet, K. L. (2016). Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. Educational and Psychological Measurement, 76, 609-637.

Last edited by daniel klein; 11 May 2017, 08:35.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#5

24 Jun 2017, 07:01

Thanks as always to Kit Baum, another update for the kappaetc command is available from the SSC archives. The update has the following fixes and enhancements:
The recently revised formula for the variance of Krippendorff's alpha produced incorrect results when missing ratings occurred. This has been fixed.

Option nsubjects() for finite sample correction did not work correctly in the previous update. This has been fixed.

Option ttest did not work with weighted data. It now does.

Alternative weights for circular data, suggested by Warrens and Pratiwi (2016), have been implemented.

A generalization of linear, quadratic and radical weights, discussed in Warrenes (2014), have been implemented.

New option testvalue() test whether agreement coefficients are equal to a specified value other than 0 (the default). One-sided tests are supported.

Best
Daniel

Warrens, M. J. (2014). Power Weighted Versions of Bennett, Alpert, and Goldstein's S. Journal of Mathematics, 1-9.
Warrens, M. J., Pratiwi, B. C. (2016). Kappa Coefficients for Circular Classifications. Journal of Classification, 33, 507-522.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#6

01 Feb 2018, 07:21

Thanks as usual to Kit Baum, another updated version of kappaetc is now available from SSC.

The update includes the following:
A couple of bugs have been fixed. This mostly concerns rare situations in which kappaetc issued an error message when it should not have (e.g., when an if condition had parentheses) or did not issue an error message when it should have (e.g., when option se() was specified with Krippendorff's ordinal weights).

The syntax for some of the options has changed; old syntax continues to work.

The output has been slightly modified (e.g., new coefficient names; right-aligned).

The help files have been revised and extended.

New option nociclipped returns and reports confidence intervals as estimated, i.e., confidence intervals are not clipped to fall into the range [-1, 1]. When the option is not specified, kappaetc now issues a note below the coefficient table that the CIs have been clipped.

New command kappaetcssi estimates approximate sample size for agreement coefficients using a rule-of-thumb like method, described in Gwet (2014, p. 159).

New option loa produces basic Bland-Altman (1986) plots for two raters/methods.

New option icc() estimates intraclass correlation coefficients (for repeated measurements per subject and rater) as a measure of inter- and intrarater reliability.

Best
Daniel

Bland, M. J., Altman, D. G. (1986). Statistical Methods for Assessing Agreement betweentwo Methods of Clinical Measurment. Lancet, 1, 307-310.
Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.
Comment

Deepani Siriwardhana

Join Date: Oct 2017
Posts: 19

18 Apr 2018, 04:26

Dear Daniel,
I'm a new Stata user and trying to use kappaetc command to calculate inter-rater reliability of few nominal, ordinal and ratio scale variables. I used three commands to calculate the IRR of ordinal variable. First I used a command without weights and then two commands with linear and quadratic weights. I wonder what is the output without weight? and which one to use in my case. I would much appreciate if you could help me to differentiate three outputs.

Code:

. kappaetc Q_116_Financial_Status_N Q_116_Recode if RA_Code==1, listwise

Interrater agreement                             Number of subjects =      14
                                                Ratings per subject =       2
                                        Number of rating categories =       3
------------------------------------------------------------------------------
                     |   Coef.   Std. Err.   t    P>|t|   [95% Conf. Interval]
---------------------+--------------------------------------------------------
   Percent Agreement |  0.6429    0.1329   4.84   0.000     0.3558     0.9300
Brennan and Prediger |  0.4643    0.1993   2.33   0.037     0.0336     0.8949
Cohen/Conger's Kappa |  0.4531    0.2024   2.24   0.043     0.0158     0.8905
    Scott/Fleiss' Pi |  0.4466    0.2091   2.14   0.052    -0.0051     0.8984
           Gwet's AC |  0.4727    0.1963   2.41   0.032     0.0486     0.8968
Krippendorff's Alpha |  0.4664    0.2091   2.23   0.044     0.0146     0.9182
------------------------------------------------------------------------------

Code:

kappaetc Q_116_Financial_Status_N Q_116_Recode if RA_Code==1, wgt(linear) listwise

Interrater agreement                             Number of subjects =      14
(weighted analysis)                             Ratings per subject =       2
                                        Number of rating categories =       3
------------------------------------------------------------------------------
                     |   Coef.   Std. Err.   t    P>|t|   [95% Conf. Interval]
---------------------+--------------------------------------------------------
   Percent Agreement |  0.8214    0.0664  12.36   0.000     0.6779     0.9650
Brennan and Prediger |  0.5982    0.1495   4.00   0.002     0.2752     0.9212
Cohen/Conger's Kappa |  0.5679    0.1703   3.34   0.005     0.2000     0.9358
    Scott/Fleiss' Pi |  0.5611    0.1775   3.16   0.008     0.1777     0.9446
           Gwet's AC |  0.6136    0.1452   4.23   0.001     0.2999     0.9274
Krippendorff's Alpha |  0.5768    0.1775   3.25   0.006     0.1934     0.9603
------------------------------------------------------------------------------

Code:

. kappaetc Q_116_Financial_Status_N Q_116_Recode if RA_Code==1, wgt(quadratic) listwise

Interrater agreement                             Number of subjects =      14
(weighted analysis)                             Ratings per subject =       2
                                        Number of rating categories =       3
------------------------------------------------------------------------------
                     |   Coef.   Std. Err.   t    P>|t|   [95% Conf. Interval]
---------------------+--------------------------------------------------------
   Percent Agreement |  0.9107    0.0332  27.41   0.000     0.8389     0.9825
Brennan and Prediger |  0.7321    0.0997   7.35   0.000     0.5168     0.9475
Cohen/Conger's Kappa |  0.6957    0.1320   5.27   0.000     0.4104     0.9809
    Scott/Fleiss' Pi |  0.6896    0.1388   4.97   0.000     0.3896     0.9895
           Gwet's AC |  0.7482    0.0964   7.76   0.000     0.5399     0.9565
Krippendorff's Alpha |  0.7007    0.1388   5.05   0.000     0.4007     1.0000
------------------------------------------------------------------------------
Confidence intervals are clipped at the upper limit.

Thank you.

Deepani

Comment

daniel klein

Join Date: Mar 2014

Posts: 3859
#8

18 Apr 2018, 06:24

Deepani

Thanks for your interest in kappaetc. I am not quite sure I fully understand your problem, though.

The first output shows results from the unweighted analysis. In the unweighted analysis, you differentiate between (full) agreement and no agreement; the two raters (observers, judges, methods, ...) agree if they assign the exact same category (out of the three possible categories) to a given subject (unit, case, ...); they disagree otherwise. This approach is usually appropriate for categories on the nominal scale. Think of colors, for example, and suppose we have 1 = red, 2 = yellow, and 3 = green. You could argue that the difference between 1 and 2 implies the same degree of (dis)agreement as the difference between 1 and 3. After all, red is no more different from yellow than it is from green.

Now suppose, we are talking about traffic lights. Here, we might argue that red differs more from green that it does from yellow, which implies a certain order of the colors. We could also think from the perspective of a persons suffering from red-green color blindness, in which case red would be very similar to green and both would be different from yellow. Either way, whenever differences between categories imply differences in the degree of (dis)agreement, you want your agreement coefficient to reflect these differences. The way you do this is by assigning a set of weights to the differences between categories. The closer the weights are to 1, the less serious the disagreement/difference between the respective categories.

I cannot tell you which weights to apply; that is a substantial question you need to answer, given your data and research question. Different sources recommend different weights and the suggestions from Gwet (2014) and Krippendorff (2013) are summarized in

Code:

help kappaetc choosing

You can display the weighting matrix that is used with option showweights.

Best
Daniel

Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.
Krippendorff, K. (2013). Computing Krippendorff's Alpha-Reliability. (2011.1.25, Literature updated 2013.9.13)
Comment
Deepani Siriwardhana

Join Date: Oct 2017

Posts: 19
#9

18 Apr 2018, 09:55

Dear Daniel,
Thank you so much for clear prompt explanation. It has answered my question.

Best regards,
Deepani
Comment
Deepani Siriwardhana

Join Date: Oct 2017

Posts: 19
#10

18 Apr 2018, 10:41

Dear Daniel,
A quick query is kappaetc, icc comand calculate the absolute agreement? Is there any way to calculate the consistency. I went through the help file but couldn't find the answer.

Thank you.

Best regards,
Deepani
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#11

18 Apr 2018, 11:48

The literature that I have used does not really go into the distinction between absolute agreement and consistency. However, since both estimators are implemented, you may obtain results for both types of agreement.

For the random-effects model, ICC(2), kappaetc reports absolute agreement, while for mixed-effects models, ICC(3), consistency is reported; this is just like Stata's icc command. Now the estimate for the mixed-effects model equals the consistency that would be obtained for the random-effects model. Likewise, the estimate for the random-effects model equals the absolute agreement that would be obtained for the mixed-effects model. All this is to say that the estimators are the same; it is just the interpretation that differs. Consult the pdf documentation for Stata's icc command to confirm the claim and for more details.

Best
Daniel

Last edited by daniel klein; 18 Apr 2018, 11:57.
Comment
Deepani Siriwardhana

Join Date: Oct 2017

Posts: 19
#12

18 Apr 2018, 12:07

Dear Daniel,
Thanks for the explanation. I truly appreciate your input. I'm bit confused though. Does it mean if I need the absolute agreement for mixed-effects models, ICC(3), do I need to run the random effects models, ICC(2)? I went through the Stata icc command but there we can specify whether we need absolute agreement or consistency.

Thanks.

Best regards,
Deepani
Comment
daniel klein

Join Date: Mar 2014

Posts: 3859
#13

18 Apr 2018, 12:24

if I need the absolute agreement for mixed-effects models, ICC(3), do I need to run the random effects models, ICC(2)?

Yes.

I went through the Stata icc command but there we can specify whether we need absolute agreement or consistency.

True, but

Code:

icc rating target judge , mixed absolute

gives the exact same result as

Code:

icc rating target judge

Best
Daniel
Comment
Deepani Siriwardhana

Join Date: Oct 2017

Posts: 19
#14

18 Apr 2018, 12:47

Cheers! Thank you so much.

Best regards,
Deepani
Comment
Deepani Siriwardhana

Join Date: Oct 2017

Posts: 19
#15

02 May 2018, 11:00

Dear Daniel,
Just a quick question again. I started to compute agreement coefficients with kappaetc programme. There are some occasions that I'm getting this message 'ratings do not vary' and Stata does not compute the agreement coefficients. How should I report my results in this case? Is it okay to consider all the agreement coefficients as 1?

Much appreciate your input. I look forward to hearing from you.

Thank you.
BW
Deepani
Comment

Announcement