Kappa statistics homogeneity in STATA: how to fix?

Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 144
#1

Kappa statistics homogeneity in STATA: how to fix?

29 Jan 2023, 15:01

Dear all,
I don't know how to fix this.

I am doing an inter-rater agreement study in which 2 operators evaluate a certain number of patients and classify them with a dichotomous outcome (positive/negative).

This takes place in three different hospitals (2 operators, different for each hospital, classify a certain number of patients belonging to that hospital).

I don't know whether a single Kappa statistic can be calculated accounting for this multicentric design.

But I know that there are methods to evaluate the homogeneity of the three Kappa statistics.

Do you have any idea how this is possible in STATA?

Thank you so much in advance!

Gianfranco
Tags: agreement, kappa, multicentric
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2458
#2

29 Jan 2023, 15:18

The expectation of these designs is that raters are exchangeable, so if that is true, you should not have to adjust for center effects, as all variability is accounted for among raters and subjects. You could look at the kappa for all raters and see how it compares qualitatively to kappas calculated for each hospital site alone. If they are roughly similar then you can justify that there are no site-specific effects.

I might also try to estimate the ICC from a two-way mixed model, once ignoring and once adding a fixed effect of center, to observe whether these two ICC estimates are similar.
1 like
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 144
#3

30 Jan 2023, 02:03

Thanks a lot, Leonardo Guizzetti .

I understand that if I have, say, a required sample size of 45 patients, I distribute 15 to each of 3 hospitals, and have the two rater operators of each hospital identified as "rater A2 and "rater B", and calculate the global Kappa?
It doesn't matter who is A and who is B, correct?

Regarding ICC in a mixed effect model, I wonder if it is adequate to use ICC for a dichotomous variable.

Thank you again.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2458
#4

30 Jan 2023, 06:48

Originally posted by Gianfranco Di Gennaro View Post

I understand that if I have, say, a required sample size of 45 patients, I distribute 15 to each of 3 hospitals, and have the two rater operators of each hospital identified as "rater A2 and "rater B", and calculate the global Kappa?
It doesn't matter who is A and who is B, correct?

If I understand you correctly, each subject within any hospital is rated by both raters at that same hospital. Then yes, you would compute a global (overall) kappa where raters are identified from 1 to 6 (I hesitate to call them each rater A and B to ensure that they have a unique identifier, or else they will incorrectly be misidentified and combined in the modeling). A conditional, or hospital-specific kappa then uses data from only those two raters.

Originally posted by Gianfranco Di Gennaro View Post

Regarding ICC in a mixed effect model, I wonder if it is adequate to use ICC for a dichotomous variable.

Yes it is, as the standard weighted Fleiss' Kappa is known to be (asymptotically) equivalent to the ICC. See the citation below.

Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and psychological measurement, 33(3), 613-619.
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 144
#5

30 Jan 2023, 09:28

Thank you again Leonardo Guizzetti

I have just one last question.
Do you think it's appropriate to compute a sample size (number of subjects to be rated) based on only two raters and splitting the sample size between the three hospitals?
Thanks again. All the best!
Gianfranco
Comment

daniel klein

Join Date: Mar 2014
Posts: 3912

30 Jan 2023, 09:49

Originally posted by Leonardo Guizzetti View Post

If I understand you correctly, each subject within any hospital is rated by both raters at that same hospital. Then yes, you would compute a global (overall) kappa where raters are identified from 1 to 6 (I hesitate to call them each rater A and B to ensure that they have a unique identifier, or else they will incorrectly be misidentified and combined in the modeling)

If you talk about Cohen's Kappa, then yes, each rater must be identified. In Fleiss' Kappa (which reduces to Scott's Pi in the two-rater case) referenced later, raters are interchangeable. Here is a quick example, using kappaetc (SJ or SSC):

Code:

// setup
webuse rate2

// mimc binary ratings
recode rada radb (1/2 = 0) (3/4 = 1)

// mimic hospital 1
generate rater1 = rada in 1/28
generate rater2 = radb in 1/28

// mimic hospital 2
generate rater3 = rada in 29/57
generate rater4 = radb in 29/57

// mimic hospital 3
generate rater5 = rada in 58/L
generate rater6 = radb in 58/L

// overall
kappaetc rada radb

// separated by hospital
kappaetc rater1-rater6

The (relevant) output is

Code:

. // overall
. kappaetc rada radb

Interrater agreement                             Number of subjects =      85
                                                Ratings per subject =       2
                                        Number of rating categories =       2
------------------------------------------------------------------------------
                     |   Coef.  Std. Err.    t    P>|t|   [95% Conf. Interval]
---------------------+--------------------------------------------------------
   Percent Agreement |  0.8471    0.0393  21.57   0.000     0.7690     0.9252
Brennan and Prediger |  0.6941    0.0785   8.84   0.000     0.5379     0.8503
Cohen/Conger's Kappa |  0.6347    0.0888   7.14   0.000     0.4580     0.8114
    Scott/Fleiss' Pi |  0.6273    0.0943   6.65   0.000     0.4397     0.8148
           Gwet's AC |  0.7406    0.0721  10.27   0.000     0.5973     0.8840
Krippendorff's Alpha |  0.6294    0.0943   6.68   0.000     0.4419     0.8170
------------------------------------------------------------------------------

. 
. // separated by hospital
. kappaetc rater1-rater6

Interrater agreement                             Number of subjects =      85
                                                Ratings per subject =       2
                                        Number of rating categories =       2
------------------------------------------------------------------------------
                     |   Coef.  Std. Err.    t    P>|t|   [95% Conf. Interval]
---------------------+--------------------------------------------------------
   Percent Agreement |  0.8471    0.0393  21.57   0.000     0.7690     0.9252
Brennan and Prediger |  0.6941    0.0785   8.84   0.000     0.5379     0.8503
Cohen/Conger's Kappa |  0.6638    0.0922   7.20   0.000     0.4805     0.8471
 Scott/Fleiss' Kappa |  0.6273    0.0943   6.65   0.000     0.4397     0.8148
           Gwet's AC |  0.7406    0.0721  10.27   0.000     0.5973     0.8840
Krippendorff's Alpha |  0.6294    0.0943   6.68   0.000     0.4419     0.8170
------------------------------------------------------------------------------

Note how Cohen's Kappa changes when we estimate agreement among 6 raters instead of pooling the rating in two raters. Note also how Fleiss' Kappa (and all other coefficients) are the same in both scenarios.

As for the equivalence of Kappa and ICC, see this post. Note however that the equality is shown for quadratically weighted Kappa. With binary ratings, quadratic weights reproduce the unweighted Kappa and the equivalence no longer holds. I am not saying that this is necessarily a problem.

I have not looked into the homogeneity of agreement coefficients.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2458
#7

30 Jan 2023, 09:52

No, I don't think that method will suffice for sample size. The design aspects that matter for sample size are the number of raters, subjects and average number of ratings per subject.

In practice, it's not unusual to examine raters at different locations (schools, hospitals, whatever). Heterogeneity due to location is usually a secondary concern, if at all.
Comment

Announcement