Thanks as always to Kit Baum, a new program kappaetc is now available from the SSC.
The name is parsed "kappa, etc." and it is borrowed from the recently released entropyetc with kind permission of Nick Cox.
Stata’s built-in capabilities for assessing interrater agreement are pretty much limited to two version of the kappa-statistic: Cohen’s (weighted) kappa for two raters and Fleiss’ kappa (which is actually a generalization of Scott’s Pi) for three or more raters. Despite the fact that these statistics have been criticized for a long time and alternatives have been developed, these measures appear to remain the ones most often used. One reason seems to be the limited availability of alternatives in statistical software.
kappaetc is intended to fill this gap in Stata. The program implements methods and formulas discussed in Gwet (2014). It calculates six agreement coefficients along with their standard errors and confidence intervals for any number of raters and any number of categories. It handles varying numbers of raters per subject and provides seven prerecorded weights, suitable for any level of measurement.
Here is an example that borrows from [R] kappa Example 9:
I hope some of you will find this useful.
Best
Daniel
Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.
The name is parsed "kappa, etc." and it is borrowed from the recently released entropyetc with kind permission of Nick Cox.
Stata’s built-in capabilities for assessing interrater agreement are pretty much limited to two version of the kappa-statistic: Cohen’s (weighted) kappa for two raters and Fleiss’ kappa (which is actually a generalization of Scott’s Pi) for three or more raters. Despite the fact that these statistics have been criticized for a long time and alternatives have been developed, these measures appear to remain the ones most often used. One reason seems to be the limited availability of alternatives in statistical software.
kappaetc is intended to fill this gap in Stata. The program implements methods and formulas discussed in Gwet (2014). It calculates six agreement coefficients along with their standard errors and confidence intervals for any number of raters and any number of categories. It handles varying numbers of raters per subject and provides seven prerecorded weights, suitable for any level of measurement.
Here is an example that borrows from [R] kappa Example 9:
Code:
. webuse rvary2 . kappaetc rater1-rater5 Interrater agreement Number of subjects = 10 Number of raters = 5 Ratings per subject: min = 3 avg = 4.7 max = 5 ------------------------------------------------------------------------------- | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------------+--------------------------------------------------------- Percent Agreement | 0.5833 0.0759 7.69 0.000 0.4117 0.7550 Brennan and Prediger | 0.3750 0.1138 3.29 0.009 0.1175 0.6325 Cohen/Conger's Kappa | 0.3853 0.1049 3.67 0.005 0.1481 0.6225 Fleiss' Kappa | 0.3586 0.1207 2.97 0.016 0.0856 0.6316 Gwet's AC | 0.3829 0.1145 3.34 0.009 0.1238 0.6420 Krippendorff's alpha | 0.3897 0.1462 2.66 0.026 0.0589 0.7205 -------------------------------------------------------------------------------
Best
Daniel
Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.
Comment