Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • standard error for kappa of multiple observers

    I want to compare a kappa fo two groups of observers (both groups >2 observers). Therefore I need the standard errors to be able to perform a Z-test. My problem is that it seems that STATA does not provide standard errors for a kappa of more than two observers. Is there any way to obtain the standard error?

    Thank you!

  • #2
    See kappaetc (SSC) and the cited literature.

    Code:
    kappaetc rater1 rater2 rater3 ... , store(group1)
    kappaetc rater4 rater5 rater6 ... , store(group2)
    kappaetc group1 == group2 , ttest
    Best
    Daniel

    Comment


    • #3
      Thank you, it works like a charm. Which cited literature do you mean?

      Thanks,
      Hassanin

      Comment


      • #4
        I was referring to the literature cited in the help file; in particular Gwet (2008, 2014, 2016). The standard errors are estimated in way that is different from what is suggested in the previous literature on inter-rater agreement coefficients, especially the kappa coefficient. I just wanted to make sure you are aware of this.

        Best
        Daniel


        Gwet, K. L. (2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psycholgy, 61, 29-48.

        Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.

        Gwet, K. L. (2016). Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. Educational and Psychological Measurement, 76, 609-637.

        Comment


        • #5
          Thank you. Interesting reads.

          Btw, in one of my questions the observers can answer more than two groups (nominal). The t-test is not suitable for the nominal variables. Is there any way to compare the kappas if we have nominal variables using this tool?

          Thanks,
          Hassanin

          Comment


          • #6
            Originally posted by Hassanin Alkaduhimi View Post
            Btw, in one of my questions the observers can answer more than two groups (nominal). The t-test is not suitable for the nominal variables. Is there any way to compare the kappas if we have nominal variables using this tool?
            I do not really understand the problem. If the rating categories are nominal (with no ordering), you would use the unweighted kappa coefficient. The subsequent t-test is based on the kappa coefficients, which are not on a nominal scale; the level of measurement of the underlying rating categories does not matter any longer. One could make the argument that the coefficients are restricted to the interval [-1; 1] and, hence, cannot follow a normal distribution. However, this argument applies irrespective of the rating categories' level of measurement. Edit: Actually, the proposed t-test is based on the subject-level coefficients, which are not restricted to interval [-1; 1].

            Perhaps you can explain what you mean by "question", "answer", "groups" and "variables"? Can you translate these terms into standard terminology of inter-rater agreement literature, like subjects/objects/units of analysis, observers/raters/judges, rating categories, etc. or explain what you are trying to do in more details with example data?

            Best
            Daniel

            Comment


            • #7
              Dear Daniel,

              I have 30 observers rating 29 subjects. The subjects consist of nominal categories, e.g. observers can rate a subject in categories 1-8. I wanted to check the differences in interrater agreement between raters based on baseline characteristics (e.g. raters with more experience vs less experience). I have tried to perform the t-test, but I get the error message "kappaetc results not based on the same number of rating categories cannot perform paired t test". This is because for example rater 1 did not rate any subject in category 3 while rater 3 did. How can I solve this?

              I have seen another thread recommending calculating a Z-test manually using difference/Sqrt[ SE1² + SE2²). I might use that to substitute the t-test.

              Thanks,
              Hassanin

              Comment


              • #8
                As long as all ratings are observed within the two groups of raters, there should not be a problem. The error message indicates that this is not the case. You can (and should) specify the categories() option to indicate the possible rating categories.*

                Depending on how your data is set up you want

                Code:
                kappaetc varlist_or_qualifiers_indicating_experienced_raters , categories(1/8) store(experienced)
                kappaetc varlist_or_qualifiers_indicating_non_experienced_raters , categories(1/8) store(non_experienced)
                kappaetc experienced == non_experienced , ttest
                Concerning the proposed Z-test, Gwet (2016) seems to have valid arguments against this approach, but you need to decide what you want. However, as a general word of advice: do not switch to a different method/estimator/strategy just because you receive an error message for your original approach. In many cases the error messages are there for good reason and indicate conceptual problems. This might be more often the case for official Stata commands than for user-written routines. However, make sure you understand why the error occurs and whether it indicates a mere technical limitation (perhaps because someone has not yet not gotten around to implement something you want) or conceptual problems that would be masked by a different approach.


                * For the (unweighted) kappa coefficient the number of rating categories does not really matter, since the calculations do not depend on the number of rating categories. However, some of the other coefficients that kappaetc estimates depend on the number of possible rating categories; hence the error message.

                Hope this helps. Best
                Daniel

                Gwet, K. L. (2016). Testing the Difference of Correlated Agreement Coefficients for Statistical Significance. Educational and Psychological Measurement, 76, 609-637.
                Last edited by daniel klein; 07 Jan 2018, 12:44.

                Comment


                • #9
                  The categories function works perfect.

                  Thank you!
                  Hassanin

                  Comment

                  Working...
                  X