Thanks for providing the data example; this makes things easier. Here is a step-by-step suggestion
The code yields (I am using Stata 12.1 in this example)
Compare these results with kappaetc's t-test implementation:
By the way, with 7 raters and 3 rating categories, the lower limit for observed agreement is 0.24 (code).
Code:
// step 1: -encode- string variables to get numeric variables /* we first define a value label with all rating categories we do this to ensure identical coding of all variables */ label define depth 1 "Floating" 2 "Neither" 3 "Deep" /* now we -encode- using our value label we will replace the original variables */ forvalues i = 1/7 { encode depth`i' , generate(numeric_depth`i') label(depth) drop depth`i' rename numeric_depth`i' depth`i' } /* now -encode- the before vs. after indicator the label ensures that "before" is coded as 1 and "after" is coded 2; this is more intutive */ label define reading 1 "before" 2 "after" encode reading , generate(numeric_reading) label(reading) drop reading rename numeric_reading reading // step 2: reshape /* we add an underscore to the variables depth1_1 will be easier on the eye than depth11 */ rename depth* depth*_ reshape wide depth1_-depth7_ , i(subject) j(reading) // step 3: implement the bootstrap program capture program drop kappaetc_bs program kappaetc_bs , rclass tempname before after diff kappaetc depth1_1-depth7_1 matrix `before' = r(b) kappaetc depth1_2-depth7_2 matrix `after' = r(b) matrix `diff' = `before' - `after' return matrix diff = `diff' end // step 4: get results set seed 42 // <- to replicate results bootstrap /// delta_pa = el(r(diff), 1, 1) /// delta_bp = el(r(diff), 1, 2) /// delta_ck = el(r(diff), 1, 3) /// delta_fk = el(r(diff), 1, 4) /// delta_ac = el(r(diff), 1, 5) /// delta_ka = el(r(diff), 1, 6) /// , reps(500) : kappaetc_bs estat bootstrap , all
The code yields (I am using Stata 12.1 in this example)
Code:
(output omitted) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- delta_pa | -.2122449 .0347841 -6.10 0.000 -.2804204 -.1440694 delta_bp | -.3183673 .0520363 -6.12 0.000 -.4203566 -.2163781 delta_ck | .0518425 .0641114 0.81 0.419 -.0738136 .1774985 delta_fk | .0476113 .0608716 0.78 0.434 -.0716949 .1669176 delta_ac | -.4180127 .0543014 -7.70 0.000 -.5244414 -.311584 delta_ka | .0474747 .0582826 0.81 0.415 -.0667571 .1617066 ------------------------------------------------------------------------------ (output omitted)
Code:
Paired t tests of agreement coefficients Number of subjects = 7 Differences (before)-(after) ------------------------------------------------------------------------------ | Diff. Std. Err. t P>|t| [95% Conf. Interval] ---------------------+-------------------------------------------------------- Percent Agreement | -0.2122 0.0398 -5.33 0.002 -0.3096 -0.1149 Brennan and Prediger | -0.3184 0.0597 -5.33 0.002 -0.4644 -0.1723 Cohen/Conger's Kappa | 0.0518 0.0691 0.75 0.482 -0.1173 0.2210 Scott/Fleiss' Kappa | 0.0476 0.0664 0.72 0.500 -0.1148 0.2100 Gwet's AC | -0.4180 0.0627 -6.67 0.001 -0.5715 -0.2646 Krippendorff's Alpha | 0.0475 0.0631 0.75 0.480 -0.1069 0.2018 ------------------------------------------------------------------------------
By the way, with 7 raters and 3 rating categories, the lower limit for observed agreement is 0.24 (code).
Comment