Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating kappa co-efficient

    I am trying to calculate a Kappa coefficient.

    background:
    There are 438 patients
    I am comparing ratio1 and ratio2.
    I am trying to determine if ratio1 is able to identify tall patients equally as a ratio2.

    I have separated the variables into ratio1_tall and ratio2_tall.

    I have then proceeded to calculate a kappa coefficient. In normal numerical terms I have found that ratio1 and ratio2 equally identify 3 TALL PATIENTS.

    On the other hand for the short patients ratio1 and ratio2 equally identify 5 short patients.

    With these numbers in mind, the ratios agree in 1.84% (8 out of 438 patients in total).

    However when the kappa value is calculated using the following code
    Code:
    clear
    input float(match match2 ratio1_tall ratio1_short ratio2_tall ratio2_short)
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 1
    0 0 0 0 0 1
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    0 0 0 0 0 0
    end
    
    tab ratio1_short ratio2_short
    kap ratio1_short ratio2_short
    
    tab ratio1_tall ratio2_tall
    kap ratio1_tall ratio2_tall
    I am surprised to find that the kappa coefficient for tall is 0.08.

    The kappa here means that this is a poor agreement however with regards to percentage this is between 70 to 80% for tall and short. The results are contradictory.

    My question:
    Can anyone explain why this is so? As I would have expected a poor kappa and also a % poor agreement.
    Should I be grouping tall + short for ratio1 into one variable: ratio1_tall.short and the same for ratio2 i.e tall +short for ratio2 into one variable: ratio2_tall.short ?
    Then I can compare a kappa?

    Click image for larger version

Name:	Screenshot 2023-11-04 at 22.35.16.png
Views:	2
Size:	243.9 KB
ID:	1732678
    Attached Files
    Last edited by Martin Imelda Borg; 04 Nov 2023, 16:37.

  • #2
    The key in this situation is that the expected agreement is very high for both tall and short. That, in turn, is because the data are very lopsided, with almost all of the observations having ratio1_short == 1, and ratio2_tall == 1. The expected agreement statistic is calculated by imagining two rating systems each of which produces a completely random rating, but with probability equal to the outcome probability.

    Imagine a rating system, rating1 that reported 1 1.2% (= 5/434, rounded to 1 decimal place) of the time and 0 otherwise--chosen strictly at random, having noting to do with the thing being rated. And imagine another system, rating2 that reported 1 21.2% (= 92/434, rounded) of the time, again strictly at random. Then they would both report 1 with probability 0.012*0.212 = 0.0025, and they would both report 0 with probability (1-.012)*(1-.212) = 0.7785. So they would agree 0.7785 + 0.0025 = .781 = 78.1% of the time even though they are not actually evaluating anything and just producing random ratings. You will notice that for the rating*_short kappa calculations, Stata has reported expected agreement = 78.14%. The only reason this differs from my calculation is that Stata didn't round the calculations along the way, whereas I did. Now, the real ratio1_short and ratio2_short actually agree 79.95% of the time. While 79.95% agreement sounds impressive, when you contrast that with 78.1% agreement achievable by just random number generators, you realize that 79.95% agreement, with this kind of distribution, is actually ridiculously puny. The kappa statistic is defined to be the difference between observed agreement and expected agreement, divided by 1 - expected agreement. If that sounds odd, think of it as the proportion of the maximum possible agreement beyond random rating agreement that is actually attained.

    Comment


    • #3
      Thanks for this…

      two questions
      1. I find this logic difficult to understand

      rating1 that reported 1 1.2% (= 5/434, rounded to 1 decimal place) of the time and 0 otherwise--chosen strictly at random, having noting to do with the thing being rated.

      how can you say this was ‘at random’ when this is actually what happened…

      2. With regards to my last question in post1,
      Should I generate a variable ratio1 and ratio2 as seen in the code below, in both ratio1 and ratio2 code which patients have been found to be tall (1) or short (2) or normal (0) . Then calculate a kappa between ratio 1 and ratio2.

      Code:
      gen ratio1 = 0
      gen ratio2 = 0
      
      replace ratio1 = 1 if ratio1_tall ==1
      replace ratio1 = 2 if ratio1_short ==1
      label define r1 1 “tall” 2 “short”
      label values ratio1 r1
      
      replace ratio2 = 1 if ratio2_tall ==1
      replace ratio2 = 2 if ratio2_short ==1
      label define r2 1 “tall” 2 “short”
      label values ratio2 r2
      
      tab ratio1 ratio2
      kap ratio1 ratio2
      I get the following results (see screen shot), with regards to the interpretation , pls correct me if im wrong

      There is an agreement in 68% of cases with a kAPPA CO OFFIECIENT OF 0.01 – which indicate poor inter-rater reliability.
      P >0.05 – Therefore there is a difference between both ratios
      In 67% of the two ratios would agree cases at random, if the scientists randomly gave values to the ratios

      Click image for larger version

Name:	2E032838-9717-43FF-B15C-2996E50A5CED.png
Views:	1
Size:	317.6 KB
ID:	1732688

      Comment


      • #4
        how can you say this was ‘at random’ when this is actually what happened…

        I didn't say that your results were generated at random. I said to imagine a random device that produced the same overall frequency of ones and zeroes as were observed in your data, but chose them at random. For example, -gen rating1 = (runiform() <= 5/434)- would be such a device. Given such devices rating1 and rating2 (another random device whose frequency of 1's and 0's matched the overall frequency of 1's and 0's as your ratio2_short, the expected agreement between them would be 78.14%.

        There is an agreement in 68% of cases with a kAPPA CO OFFIECIENT OF 0.01 – which indicate poor inter-rater reliability.
        P >0.05 – Therefore there is a difference between both ratios
        In 67% of the two ratios would agree cases at random, if the scientists randomly gave values to the ratios
        I would omit "there is a difference between both ratios." That could be misconstrued as saying that the two ratings have different overall rating probabilities--which is true in your instance, but has nothing to do with what kappa tests. What kappa tells you, and I would say it this way, is that the two ratings do not agree substantially more often than would happen simply by chance.

        Comment

        Working...
        X