Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Dear Daniel,

    Many thanks for the swift and helpfull reply.

    Sorry for the inconvience.

    Best regards,
    /Christian

    Comment


    • #32
      Hi Daniel,

      I'm delighted to have found the kappaetc command and your accompanying literature which has been really useful for a project i am working on.
      Initially i used the kap command for my data, and am now comparing to the kappaetc output

      Code:
      kap newph1_nausea_d newph1_nausea, wgt(w)
      kappaetc newph1_nausea_d newph1_nausea, wgt(w)
      I've found the kappa statistic from the kap differs very slightly from the Cohen/Conger's Kappa from the kappaetc command. I believe this is just due to the inclusion of missing values within the latter when calculating the marginal probabilities, is this correct? Apologies if I have missed this in the literature

      Kind Regards,
      Jo
      Last edited by Jo Brigden; 08 Oct 2020, 05:20.

      Comment


      • #33
        Jo,

        Given the lack of example data and/or output, it is hard to be sure. Missing values would cause the estimates to differ. You can check that by specifying kappaetc's listwise option, which causes observations with missing values to be excluded.
        ​​​​

        Comment


        • #34
          Thank you

          Comment


          • #35
            Hello,
            I have conducted an intra- and interrater reliability study.
            The outcome is fail/pass as in 0/1. For interrater reliability I have use the command kappaetc in STATA which I appreciated a lot, a big thanks to Daniel Klein.

            I have also used the same command for intrarater reliability, but in the end I got uncertain if I could use this command for intrarater reliability? I don’t find anything about this in the help section of STATA.
            Do anyone know?

            John Ressman

            Comment


            • #36
              I suppose, technically, you could treat the repeated measurements as different raters; you could then estimate intra-rater reliability. Whether the theoretical assumptions, that underly the coefficients, hold is another question. For example, Cohen's kappa assumes fixed raters and bases its estimate of chance agreement on the frequencies with which each rater uses the rating categories. With a 0/1 score, you might also want to look into intra-class correlations (ICC). kappaetc estimates those, too.

              Comment


              • #37
                Thank´s for your comment and time, John

                Comment


                • #38
                  I am sometimes getting questions about kappaetc's probabilistic benchmarking method both privately and here on Statalist (see this example). The respective questions are about counterintuitive (well, basically "wrong") cumulative interval-membership probabilities. People observe that, despite high agreement coefficients and small standard errors, the probabilistic benchmarking method results point towards the lowest category. Here is my explanation, copied from an email response of mine:


                  The problem that you report has [...] to do with the way we calculate the cumulative probabilities. Basically, what happens is this: The agreement coefficients are (mathematically) bound between -1 and 1. Yet, we are estimating the interval-membership probabilities (IMP) from a standard normal distribution [or t-distribution], which does not have an upper or lower bound. Hence, the cumulative probabilities (for the interval -1; 1) might not, in fact, add up to 1 (because the interval -infinity; infinity adds to 1). The way that kappaetc implements the probabilistic benchmarking method forces the cumulative probabilities to sum to 1 at the lowest benchmark level.

                  Kilem Gwet is currently working on a revised version of his book(http://inter-rater-reliability.blogspot.com) [on which kappaetc is based] that includes a (in my view) better way to handle the problem. He proposes using a "truncated" normal distribution that rescales the probabilities in a way that each coefficient has its own upper bound.

                  I have not yet found the time to implement that in kappaetc and I assume I will not find it in the near future [but it is on my to-do list]. However, I have attached a workaround that rescales the cumulative probabilities from the r() results of kappaetc. The do-file contains an example with comments.


                  Here, I attach the code of the do-file that I am referring to:

                  Code:
                  // We need example data
                  *clear // <- uncomment will clear data in memory
                  webuse rate2 // <- example data
                  
                  // -- Run -kappaetc-
                  kappaetc rada radb , wgt(power 8)
                      // Above, we use ridiculous weights to get high coefficients
                  
                  // -- Now the benchmarks
                  kappaetc , benchmark largesample
                  /*
                      Above, we specify -largesample- so -kappaetc- uses the standard normal.
                      This is what K. Gwet suggests to do. I found it a bit inconsistent to 
                      use the t-distribution for confidence intervals (with default 
                      standard errors) but then switch to the standard normal for 
                      benchmarking. Therefore, -kappetc- would normally use the 
                      t-distribution for benchmarking. Anyway, I have implemented the 
                      workaround code below in terms of the standard normal, so we are 
                      using it here.
                  */ 
                      
                  // -- Calculate rescaled benchmark intervals
                  mata {
                      b     = st_matrix("r(b)")
                      se    = st_matrix("r(se)")
                      trunc = normal((b+(0, J(1, 5, 1))):/se)-normal((b-(1, J(1, 5, 1))):/se)
                      st_matrix("trunc", trunc)
                      st_matrixcolstripe("trunc", st_matrixcolstripe("r(b)"))
                      st_matrix("p_cum_trunc", st_matrix("r(p_cum)"):/trunc)
                      st_matrixcolstripe("p_cum_trunc", st_matrixcolstripe("r(b)"))
                  }
                  
                      // Below are the original IMPs and cumulative IMPs
                      // Note: r1 is the highest benchmark level, r6 the lowest
                  matlist r(imp)
                  matlist r(p_cum)
                      /* 
                          See where the problem comes from? All coefficients have a 90 percent 
                          chance (or higher) to fall into the highest interval. The 
                          probabilities for each of being lower than the highest interval are 
                          near 0. This is because there is (mathematically) a probability of 
                          ~ 10 percent for the coefficients to exceed 1. 
                  
                          OK, let's fix this. 
                          Below are the truncated cumulative probabilities for the interval 
                          [-1; 1]. These probabilities represent the coefficient-specific 
                          upper bounds. Because percent agreement cannot be below 0, we define 
                          the respective interval as [0; 1]. To be honest, I do not know 
                          whether benchmarking the percent agreement makes sense; I 
                          doubt it but I include the interval for consistency.
                      */
                  matlist trunc
                  
                      /* 
                          Finally, we have the rescaled cumulative probabilities below. 
                          Theoretically, these sum to 1 (and they do). The reason for the 
                          probability associated with the lowest interval (r6) being off 
                          is a technical flaw in the workaround: As you can see from the 
                          original IMPs, -kappaetc- fixes the lower bounds at 1 before 
                          summing from the top. Because the workaround does not recalculate 
                          the sum, the lowest interval now has probability 1/tCMP, with 
                          tCMP := truncated cumulative probability. Had we used the actual 
                          sum, the lowest category would also be 1.
                      */
                  matlist p_cum_trunc // <- rescaled cumulative probabilities
                      /*
                          You will have to select the interval yourself. Pick the first one, 
                          starting from r1, that exceeds the threshold (usually 95%). 
                          If you cannot remember them, you can see the (upper) interval 
                          limits in r(benchmarks).
                      */
                  matlist r(benchmarks)

                  Comment


                  • #39
                    I have implemented the refined approach that bases probabilistic benchmarking on "truncated" distributions (see Gwet 2021, 234ff.) in kappaetc.

                    The latest version of kappaetc is

                    Code:
                    . which kappaetc
                    c:\ado\plus\k\kappaetc.ado
                    *! version 2.1.0 11aug2022 daniel klein
                    and, thanks to Kit Baum, it is already available from the SSC.

                    With the latest version of kappaetc, the workaround in #38 is no longer neccessary and will, in fact, produce the wrong results. Do not use the code in
                    #38 with kappaetc from SSC.


                    Gwet, K. L. 2021. Handbook of inter-rater reliability. The definitive guide to measuring the extent of agreement among raters -- Volume 1: Analysis of categorical ratings (5th ed.). AgreeStat Analytics.

                    Comment

                    Working...
                    X