kappaetc available from SSC

Christian dam

Join Date: Apr 2020

Posts: 2
#31

08 Apr 2020, 05:14

Dear Daniel,

Many thanks for the swift and helpfull reply.

Sorry for the inconvience.

Best regards,
/Christian
Comment
Jo Brigden

Join Date: Oct 2020

Posts: 4
#32

08 Oct 2020, 05:17

Hi Daniel,

I'm delighted to have found the kappaetc command and your accompanying literature which has been really useful for a project i am working on.
Initially i used the kap command for my data, and am now comparing to the kappaetc output

Code:

kap newph1_nausea_d newph1_nausea, wgt(w) kappaetc newph1_nausea_d newph1_nausea, wgt(w)

I've found the kappa statistic from the kap differs very slightly from the Cohen/Conger's Kappa from the kappaetc command. I believe this is just due to the inclusion of missing values within the latter when calculating the marginal probabilities, is this correct? Apologies if I have missed this in the literature

Kind Regards,
Jo

Last edited by Jo Brigden; 08 Oct 2020, 05:20.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3911
#33

08 Oct 2020, 10:46

Jo,

Given the lack of example data and/or output, it is hard to be sure. Missing values would cause the estimates to differ. You can check that by specifying kappaetc's listwise option, which causes observations with missing values to be excluded.
Comment
Jo Brigden

Join Date: Oct 2020

Posts: 4
#34

09 Oct 2020, 07:06

Thank you
Comment
John Ressman

Join Date: Oct 2020

Posts: 5
#35

24 Oct 2020, 03:21

Hello,
I have conducted an intra- and interrater reliability study.
The outcome is fail/pass as in 0/1. For interrater reliability I have use the command kappaetc in STATA which I appreciated a lot, a big thanks to Daniel Klein.

I have also used the same command for intrarater reliability, but in the end I got uncertain if I could use this command for intrarater reliability? I don’t find anything about this in the help section of STATA.
Do anyone know?

John Ressman
Comment
daniel klein

Join Date: Mar 2014

Posts: 3911
#36

24 Oct 2020, 06:57

I suppose, technically, you could treat the repeated measurements as different raters; you could then estimate intra-rater reliability. Whether the theoretical assumptions, that underly the coefficients, hold is another question. For example, Cohen's kappa assumes fixed raters and bases its estimate of chance agreement on the frequencies with which each rater uses the rating categories. With a 0/1 score, you might also want to look into intra-class correlations (ICC). kappaetc estimates those, too.
1 like
Comment
John Ressman

Join Date: Oct 2020

Posts: 5
#37

31 Oct 2020, 04:27

Thank´s for your comment and time, John
Comment

daniel klein

Join Date: Mar 2014
Posts: 3911

#38

06 Dec 2020, 05:58

I am sometimes getting questions about kappaetc's probabilistic benchmarking method both privately and here on Statalist (see this example). The respective questions are about counterintuitive (well, basically "wrong") cumulative interval-membership probabilities. People observe that, despite high agreement coefficients and small standard errors, the probabilistic benchmarking method results point towards the lowest category. Here is my explanation, copied from an email response of mine:

The problem that you report has [...] to do with the way we calculate the cumulative probabilities. Basically, what happens is this: The agreement coefficients are (mathematically) bound between -1 and 1. Yet, we are estimating the interval-membership probabilities (IMP) from a standard normal distribution [or t-distribution], which does not have an upper or lower bound. Hence, the cumulative probabilities (for the interval -1; 1) might not, in fact, add up to 1 (because the interval -infinity; infinity adds to 1). The way that kappaetc implements the probabilistic benchmarking method forces the cumulative probabilities to sum to 1 at the lowest benchmark level.

Kilem Gwet is currently working on a revised version of his book(http://inter-rater-reliability.blogspot.com) [on which kappaetc is based] that includes a (in my view) better way to handle the problem. He proposes using a "truncated" normal distribution that rescales the probabilities in a way that each coefficient has its own upper bound.

I have not yet found the time to implement that in kappaetc and I assume I will not find it in the near future [but it is on my to-do list]. However, I have attached a workaround that rescales the cumulative probabilities from the r() results of kappaetc. The do-file contains an example with comments.

Here, I attach the code of the do-file that I am referring to:

Code:

// We need example data
*clear // <- uncomment will clear data in memory
webuse rate2 // <- example data

// -- Run -kappaetc-
kappaetc rada radb , wgt(power 8)
    // Above, we use ridiculous weights to get high coefficients

// -- Now the benchmarks
kappaetc , benchmark largesample
/*
    Above, we specify -largesample- so -kappaetc- uses the standard normal.
    This is what K. Gwet suggests to do. I found it a bit inconsistent to 
    use the t-distribution for confidence intervals (with default 
    standard errors) but then switch to the standard normal for 
    benchmarking. Therefore, -kappetc- would normally use the 
    t-distribution for benchmarking. Anyway, I have implemented the 
    workaround code below in terms of the standard normal, so we are 
    using it here.
*/ 
    
// -- Calculate rescaled benchmark intervals
mata {
    b     = st_matrix("r(b)")
    se    = st_matrix("r(se)")
    trunc = normal((b+(0, J(1, 5, 1))):/se)-normal((b-(1, J(1, 5, 1))):/se)
    st_matrix("trunc", trunc)
    st_matrixcolstripe("trunc", st_matrixcolstripe("r(b)"))
    st_matrix("p_cum_trunc", st_matrix("r(p_cum)"):/trunc)
    st_matrixcolstripe("p_cum_trunc", st_matrixcolstripe("r(b)"))
}

    // Below are the original IMPs and cumulative IMPs
    // Note: r1 is the highest benchmark level, r6 the lowest
matlist r(imp)
matlist r(p_cum)
    /* 
        See where the problem comes from? All coefficients have a 90 percent 
        chance (or higher) to fall into the highest interval. The 
        probabilities for each of being lower than the highest interval are 
        near 0. This is because there is (mathematically) a probability of 
        ~ 10 percent for the coefficients to exceed 1. 

        OK, let's fix this. 
        Below are the truncated cumulative probabilities for the interval 
        [-1; 1]. These probabilities represent the coefficient-specific 
        upper bounds. Because percent agreement cannot be below 0, we define 
        the respective interval as [0; 1]. To be honest, I do not know 
        whether benchmarking the percent agreement makes sense; I 
        doubt it but I include the interval for consistency.
    */
matlist trunc

    /* 
        Finally, we have the rescaled cumulative probabilities below. 
        Theoretically, these sum to 1 (and they do). The reason for the 
        probability associated with the lowest interval (r6) being off 
        is a technical flaw in the workaround: As you can see from the 
        original IMPs, -kappaetc- fixes the lower bounds at 1 before 
        summing from the top. Because the workaround does not recalculate 
        the sum, the lowest interval now has probability 1/tCMP, with 
        tCMP := truncated cumulative probability. Had we used the actual 
        sum, the lowest category would also be 1.
    */
matlist p_cum_trunc // <- rescaled cumulative probabilities
    /*
        You will have to select the interval yourself. Pick the first one, 
        starting from r1, that exceeds the threshold (usually 95%). 
        If you cannot remember them, you can see the (upper) interval 
        limits in r(benchmarks).
    */
matlist r(benchmarks)

Comment

daniel klein

Join Date: Mar 2014

Posts: 3911
#39

11 Aug 2022, 07:12

I have implemented the refined approach that bases probabilistic benchmarking on "truncated" distributions (see Gwet 2021, 234ff.) in kappaetc.

The latest version of kappaetc is

Code:

. which kappaetc c:\ado\plus\k\kappaetc.ado *! version 2.1.0 11aug2022 daniel klein

and, thanks to Kit Baum, it is already available from the SSC.

With the latest version of kappaetc, the workaround in #38 is no longer neccessary and will, in fact, produce the wrong results. Do not use the code in
#38 with kappaetc from SSC.

Gwet, K. L. 2021. Handbook of inter-rater reliability. The definitive guide to measuring the extent of agreement among raters -- Volume 1: Analysis of categorical ratings (5th ed.). AgreeStat Analytics.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment