No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • inter-rater reliability and icc

    Hi all,

    I am trying to calculate inter-rater reliability with a complicated study and data structure. Below is a (fake) example that illustrates the structure for 2 targets:

    * Example generated by -dataex-. To install: ssc install dataex
    input byte target str1 type byte rater str3 ph byte outcome
    1 "A"  3 "MP1" 90
    1 "B"  3 "MP1" 78
    1 "A"  3 "MP2" 46
    1 "B"  3 "MP2" 30
    1 "A"  5 "MP1" 20
    1 "B"  5 "MP1" 45
    1 "A"  5 "MP2" 23
    1 "B"  5 "MP2" 12
    1 "A"  7 "MP1" 20
    1 "B"  7 "MP1" 45
    1 "A"  7 "MP2" 23
    1 "B"  7 "MP2" 12
    1 "A"  9 "MP1" 20
    1 "B"  9 "MP1" 45
    1 "A"  9 "MP2" 23
    1 "B"  9 "MP2" 12
    2 "A"  9 "MP1" 98
    2 "B"  9 "MP1" 99
    2 "A"  9 "MP2" 34
    2 "B"  9 "MP2" 23
    2 "A" 10 "MP1" 67
    2 "B" 10 "MP1" 79
    2 "A" 10 "MP2" 90
    2 "B" 10 "MP2" 45
    2 "A" 11 "MP1" 24
    2 "B" 11 "MP1" 34
    2 "A" 11 "MP2" 23
    2 "B" 11 "MP2" 34
    2 "A" 12 "MP1" 52
    2 "B" 12 "MP1" 14
    2 "A" 12 "MP2" 12
    2 "B" 12 "MP2" 12

    Characteristics of study:
    • Every target is rated by 4 raters. Note, the same set of raters does not rate each target.
    • Each rater rates ALL the data for 2 targets.

    My analysis model is below. Specifically, there are 3 random intercepts.

    mixed outcome indeps || _all: R.rater || _all: || _all:
    1. Looking at the manual for the command icc, options include:
      1. one-way random-effects model: In the one-way random-effects model, each target is rated by a different set of k independent raters, who are randomly drawn from the population of raters. The target is the only random effect in this model; the effects due to raters and possibly due to rater-and-target interaction cannot be separated from random error.
      2. two-way random-effects model: In the two-way random-effects model, each target is rated by the same set of k independent raters, who are randomly drawn from the population of raters. The random effects in this model are target and rater and possibly their interaction, although in the absence of repeated measurements for each rater on each target, the effect of an interaction cannot be separated from random error.
    1. In the two way random effects model, each target is rated by the same set of raters (does not seem true in this case). So can't use that. In the one way random effects model, each target is rated by a different set of raters (in my study however, each rater rates ALL the data for 2 targets, so each data is linked to 2 targets). So that doesn't seem quite true either. My question: Is it OK to use the one-way random effects model here? Or does the design of this study make calculation of inter-rater reliability impossible or not advised?

    2. When one is calculating inter-rater reliability for a study with multiple outcome variables, does one typically calculate an inter-rater reliability score for each outcome measure? Or does typically one choose one measure?

    Thank you!

  • #2
    I think this structure is not suitable for the -icc- command.

    Rather, I think after running your -mixed- model you can calculate an intraclass correlation directly as the variance component at the target level divided by the total of all variance components. This would not, strictly speaking, be an inter-rater reliability, because you also have another variance component at the level of variable ph. But it is properly in the spirit of reliability: it is the proportion of variance due to the target itself, and not to extraneous factors.

    If you have several measures, each would warrant its own intraclass correlation assessment.


    • #3
      Thank you! This is very useful. Using this formula: var( / var(R.rater) +var( +var( +var(Residual) and the below output I get a value of 0.09 for one measure. On another measure this value comes 0.13.

      My question: assuming I have used the right formula, does this indicate poor reliability since the values are so low?

        Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
      _all: Identity               |
                      var(R.rater) |   79.88473   15.98827      53.96383    118.2564
      _all: Identity               |
                     var( |   59.60948     17.466      33.56659    105.8579
      _all: Identity               |
                         var( |   42.60954   8.641743      28.63338    63.40757
                     var(Residual) |   505.0235   6.432765      492.5716    517.7902
      LR test vs. linear model: chi2(3) = 3110.89               Prob > chi2 = 0.0000


      • #4
        Is iccvar:
        iccvar from
        'ICCVAR': module to calculate intraclass correlation (ICC) after xtmixed /
        iccvar is a post-estimation command for xtmixed. After fitting / a 2, 3,
        or 4 level model with a random intercept (random / slopes are not
        supported), iccvar will calculate the / intraclass correlation (ICC)

        of any use?


        • #5
          Re #3: Yes, you calculated correctly, and, yes, it indicates poor inter-rater reliability. Most of the variation in rating is attributable to noise (residual variance), and a bit to the rater and a bit to ph. Very little is attributable to the target itself.

          Re #4: Hard to say. I'm not really familiar with this command. But reviewing the help file, I think it will not do the job here. The original question relates to a model with crossed random effects. The help file for iccvar does not say anything about the nesting/crossing/multiple membership structure of the effects, but it has no special options to allow the user to specify this. So, I'm guessing that it is designed only for use with nested models. (Although perhaps it is "smart" enough to read the syntax of the -mixed- command it follows and figure this out without user guidance, I don't know.)


          • #6
            Many thanks Clyde - this is very helpful!