Percent Correct / Percent Agreement

Allison Cusano

Join Date: Apr 2024

Posts: 6
#1

Percent Correct / Percent Agreement

23 Nov 2024, 09:58

Hello,

I have a data set from which I am calculating inter-rater reliability. I have run kappaetc and have what I need from that output. However, I would like to create two new variables called percentcorrect, which represents the percent of raters that scored each item correctly, and percentagree, which represents the percent agreement per item. The item scores are binary and the correct answers are binary. For example, if the item should have been rated yes, then the answer code is 1. No is 0. I have already generated the percentcorrect and percentagree variables which are currently equal to 0.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str7 item byte(raterw raterg raterd ratert raterbe raterbr raterr correctscore) float(percentcorrect percentagree) "R.8.01" 0 1 1 1 1 1 1 1 0 0 "R.8.02" 0 1 1 0 0 0 1 0 0 0 "R.8.03" 1 1 1 1 1 1 1 1 0 0 "R.9.03" 1 1 1 1 1 1 1 1 0 0 "R.2.01" 1 1 1 1 1 1 1 1 0 0 end

My question:

Is there a way to calculate the percentcorrect and percentagree variables using my current data input. I feel like there should be a simple way and am hoping to learn something new and possibly generalizable to future analyses.

OR

Am I right in my thinking that I should create new binary variables for each rater that indicate whether or not they scored the item accurately to use to create the percentcorrect variable? If so, do you have a suggestion for the percentagree variable at the item level?

Thanks so much! Sometimes I look at something too long and make it FAR more complicated than it should be.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

23 Nov 2024, 10:28

I'm not sure what you mean by percent agreement. With just two raters, it is clear that it means the percent of items on which both raters gave the same response. It is less clear what this means with a larger number of raters. In the code below, I presume that you mean: consider all possible (unordered, and non-identical)* pairs of raters, and calculate the percent of those pairs in which the two paired raters gave the same response.

Note that the calculations are best done with the data in long layout, not wide. In the end, I have restored the results to wide layout. But you should think ahead about what you will be doing next with this data. Most Stata data management and analysis commands work best, or only, with long data. So unless you know that you will be doing things that Stata does better with wide data, you would be best advised to omit that final -reshape wide- command and keep the data in long layout.

Code:

rename rater* response* reshape long response, i(item) j(rater) string preserve keep item rater response rename (rater response) =_U tempfile holding save `holding' restore by item (rater), sort: egen percent_correct = mean(response == correctscore) replace percent_correct = 100*percent_correct preserve joinby item using `holding' keep if rater > rater_U by item (rater rater_U), sort: egen percent_agreement = mean(response == response_U) replace percent_agreement = 100*percent_agreement by item: keep if _n == 1 keep item percent_agreement save `holding', replace restore merge m:1 item using `holding', assert(match) nogenerate reshape wide rename response* rater*

*By unordered, non-identical pairs, I mean that we do not count a "pair" where both members of the pair are the same rater, and we do not consider the pair X, Y to differ from the pair Y, X.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3886
#3

24 Nov 2024, 11:55

Originally posted by Allison Cusano View Post

Is there a way to calculate the percentcorrect and percentagree variables using my current data input. I feel like there should be a simple way ...

So, the percentage of correct ratings is indeed simple (assuming no missing ratings):

Code:

egen number_of_yes = rowtotal(raterw raterg raterd ratert raterbe raterbr raterr) generate percent_correct = cond(correctscore==1,number_of_yes,7-number_of_yes) / .07

As Clyde Schechter points out, for more than two raters, there are different ways to define percent agreement. You mention kappaetc (preferably from SSC but fine from SJ). You can use the store() option to put the subject-level percent agreement into r(). You can then shift these results into a variable:

Code:

kappaetc raterw raterg raterd ratert raterbe raterbr raterr , store(my_r_results) matrix percent_agreement = r(b_istar)[1..5,1] svmat percent_agreement

Because you know the "correct" rating category of the subjects, you might want what Gwet (2014, 324f.) calls "validity" coefficients rather than reliability coefficients. This is partly implemented, albeit not documented, in kappaetc via the acm() option. ACM is short for "absolute category membership" (Gwet 2014, 312). The basic idea is that agreement only counts as such if it is the correct absolute category membership. With your example data, I get

Code:

. kappaetc raterw raterg raterd ratert raterbe raterbr raterr , acm(correctscore) Interrater agreement Number of subjects = 5 ( ACM analysis) Ratings per subject = 7 Number of rating categories = 2 ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------------+-------------------------------------------------------- Percent Agreement | 0.8000 0.1400 5.72 0.005 0.4114 1.0000 Brennan and Prediger | 0.7333 0.1866 3.93 0.017 0.2152 1.0000 Gwet's AC | 0.7721 0.1798 4.29 0.013 0.2728 1.0000 ------------------------------------------------------------------------------ Confidence intervals are clipped at the upper limit.

A final remark: With binary ratings and 7 raters, the minimum observed agreement (regardless of correct or incorrect category) is 4 out of 7, which is 57 %. Keep this in mind when interpreting the percent agreement.

Gwet, K. L. 2014. Handbook of Inter-Rater Reliability. Gaithersburg, MD: Advanced Analytics, LLC.

Last edited by daniel klein; 24 Nov 2024, 12:02.
1 like
Comment
Allison Cusano

Join Date: Apr 2024

Posts: 6
#4

25 Nov 2024, 07:46

Thank you Clyde and Daniel. Clyde, this is all I need to do with this data set. I reshaped it prior to analysis to run kappa. The code you shared makes sense and will definitely be helpful as additional, more complex iterations of this work come in. Daniel, thank you for bringing up the acm option. I will apply to the full data set. You bring up a good point about the binary ratings vs. 7 raters - this has been a discussion as there are 49 total items across these same 7 raters and the team I am working with wants to achieve, at minimum, 90% agreement. Technically doable? Yes. I have explained the challenge and still we persist!

Thank you again for your help.

Allison
Comment

Announcement

Percent Correct / Percent Agreement

Comment

Comment

Comment