standard error for kappa of multiple observers

Hassanin Alkaduhimi

Join Date: Dec 2017

Posts: 5
#1

standard error for kappa of multiple observers

28 Dec 2017, 11:41

I want to compare a kappa fo two groups of observers (both groups >2 observers). Therefore I need the standard errors to be able to perform a Z-test. My problem is that it seems that STATA does not provide standard errors for a kappa of more than two observers. Is there any way to obtain the standard error?

Thank you!
Tags: None
daniel klein

Join Date: Mar 2014

Posts: 3890
#2

28 Dec 2017, 15:01

See kappaetc (SSC) and the cited literature.

Code:

kappaetc rater1 rater2 rater3 ... , store(group1) kappaetc rater4 rater5 rater6 ... , store(group2) kappaetc group1 == group2 , ttest

Best
Daniel
1 like
Comment
Hassanin Alkaduhimi

Join Date: Dec 2017

Posts: 5
#3

06 Jan 2018, 13:31

Thank you, it works like a charm. Which cited literature do you mean?

Thanks,
Hassanin
Comment
daniel klein

Join Date: Mar 2014

Posts: 3890
#4

06 Jan 2018, 15:02

I was referring to the literature cited in the help file; in particular Gwet (2008, 2014, 2016). The standard errors are estimated in way that is different from what is suggested in the previous literature on inter-rater agreement coefficients, especially the kappa coefficient. I just wanted to make sure you are aware of this.

Best
Daniel

Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psycholgy, 61, 29-48.

Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.

Gwet, K. L. (2016). Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. Educational and Psychological Measurement, 76, 609-637.
1 like
Comment
Hassanin Alkaduhimi

Join Date: Dec 2017

Posts: 5
#5

07 Jan 2018, 11:09

Thank you. Interesting reads.

Btw, in one of my questions the observers can answer more than two groups (nominal). The t-test is not suitable for the nominal variables. Is there any way to compare the kappas if we have nominal variables using this tool?

Thanks,
Hassanin
Comment
daniel klein

Join Date: Mar 2014

Posts: 3890
#6

07 Jan 2018, 11:55

Originally posted by Hassanin Alkaduhimi View Post

Btw, in one of my questions the observers can answer more than two groups (nominal). The t-test is not suitable for the nominal variables. Is there any way to compare the kappas if we have nominal variables using this tool?

I do not really understand the problem. If the rating categories are nominal (with no ordering), you would use the unweighted kappa coefficient. The subsequent t-test is based on the kappa coefficients, which are not on a nominal scale; the level of measurement of the underlying rating categories does not matter any longer. One could make the argument that the coefficients are restricted to the interval [-1; 1] and, hence, cannot follow a normal distribution. However, this argument applies irrespective of the rating categories' level of measurement. Edit: Actually, the proposed t-test is based on the subject-level coefficients, which are not restricted to interval [-1; 1].

Perhaps you can explain what you mean by "question", "answer", "groups" and "variables"? Can you translate these terms into standard terminology of inter-rater agreement literature, like subjects/objects/units of analysis, observers/raters/judges, rating categories, etc. or explain what you are trying to do in more details with example data?

Best
Daniel
Comment
Hassanin Alkaduhimi

Join Date: Dec 2017

Posts: 5
#7

07 Jan 2018, 12:19

Dear Daniel,

I have 30 observers rating 29 subjects. The subjects consist of nominal categories, e.g. observers can rate a subject in categories 1-8. I wanted to check the differences in interrater agreement between raters based on baseline characteristics (e.g. raters with more experience vs less experience). I have tried to perform the t-test, but I get the error message "kappaetc results not based on the same number of rating categories cannot perform paired t test". This is because for example rater 1 did not rate any subject in category 3 while rater 3 did. How can I solve this?

I have seen another thread recommending calculating a Z-test manually using difference/Sqrt[ SE1² + SE2²). I might use that to substitute the t-test.

Thanks,
Hassanin
Comment
daniel klein

Join Date: Mar 2014

Posts: 3890
#8

07 Jan 2018, 12:40

As long as all ratings are observed within the two groups of raters, there should not be a problem. The error message indicates that this is not the case. You can (and should) specify the categories() option to indicate the possible rating categories.*

Depending on how your data is set up you want

Code:

kappaetc varlist_or_qualifiers_indicating_experienced_raters , categories(1/8) store(experienced) kappaetc varlist_or_qualifiers_indicating_non_experienced_raters , categories(1/8) store(non_experienced) kappaetc experienced == non_experienced , ttest

Concerning the proposed Z-test, Gwet (2016) seems to have valid arguments against this approach, but you need to decide what you want. However, as a general word of advice: do not switch to a different method/estimator/strategy just because you receive an error message for your original approach. In many cases the error messages are there for good reason and indicate conceptual problems. This might be more often the case for official Stata commands than for user-written routines. However, make sure you understand why the error occurs and whether it indicates a mere technical limitation (perhaps because someone has not yet not gotten around to implement something you want) or conceptual problems that would be masked by a different approach.

* For the (unweighted) kappa coefficient the number of rating categories does not really matter, since the calculations do not depend on the number of rating categories. However, some of the other coefficients that kappaetc estimates depend on the number of possible rating categories; hence the error message.

Hope this helps. Best
Daniel

Gwet, K. L. (2016). Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. Educational and Psychological Measurement, 76, 609-637.

Last edited by daniel klein; 07 Jan 2018, 12:44.
Comment
Hassanin Alkaduhimi

Join Date: Dec 2017

Posts: 5
#9

07 Jan 2018, 12:50

The categories function works perfect.

Thank you!
Hassanin
Comment

Announcement

standard error for kappa of multiple observers

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment