Thanks for providing the data example; this makes things easier. Here is a step-by-step suggestion
The code yields (I am using Stata 12.1 in this example)
Compare these results with kappaetc's t-test implementation:
By the way, with 7 raters and 3 rating categories, the lower limit for observed agreement is 0.24 (code).
Code:
// step 1: -encode- string variables to get numeric variables
/*
we first define a value label with all rating categories
we do this to ensure identical coding of all variables
*/
label define depth 1 "Floating" 2 "Neither" 3 "Deep"
/*
now we -encode- using our value label
we will replace the original variables
*/
forvalues i = 1/7 {
encode depth`i' , generate(numeric_depth`i') label(depth)
drop depth`i'
rename numeric_depth`i' depth`i'
}
/*
now -encode- the before vs. after indicator
the label ensures that "before" is coded as 1
and "after" is coded 2; this is more intutive
*/
label define reading 1 "before" 2 "after"
encode reading , generate(numeric_reading) label(reading)
drop reading
rename numeric_reading reading
// step 2: reshape
/*
we add an underscore to the variables
depth1_1 will be easier on the eye than depth11
*/
rename depth* depth*_
reshape wide depth1_-depth7_ , i(subject) j(reading)
// step 3: implement the bootstrap program
capture program drop kappaetc_bs
program kappaetc_bs , rclass
tempname before after diff
kappaetc depth1_1-depth7_1
matrix `before' = r(b)
kappaetc depth1_2-depth7_2
matrix `after' = r(b)
matrix `diff' = `before' - `after'
return matrix diff = `diff'
end
// step 4: get results
set seed 42 // <- to replicate results
bootstrap ///
delta_pa = el(r(diff), 1, 1) ///
delta_bp = el(r(diff), 1, 2) ///
delta_ck = el(r(diff), 1, 3) ///
delta_fk = el(r(diff), 1, 4) ///
delta_ac = el(r(diff), 1, 5) ///
delta_ka = el(r(diff), 1, 6) ///
, reps(500) : kappaetc_bs
estat bootstrap , all
The code yields (I am using Stata 12.1 in this example)
Code:
(output omitted)
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
delta_pa | -.2122449 .0347841 -6.10 0.000 -.2804204 -.1440694
delta_bp | -.3183673 .0520363 -6.12 0.000 -.4203566 -.2163781
delta_ck | .0518425 .0641114 0.81 0.419 -.0738136 .1774985
delta_fk | .0476113 .0608716 0.78 0.434 -.0716949 .1669176
delta_ac | -.4180127 .0543014 -7.70 0.000 -.5244414 -.311584
delta_ka | .0474747 .0582826 0.81 0.415 -.0667571 .1617066
------------------------------------------------------------------------------
(output omitted)
Code:
Paired t tests of agreement coefficients Number of subjects = 7
Differences (before)-(after)
------------------------------------------------------------------------------
| Diff. Std. Err. t P>|t| [95% Conf. Interval]
---------------------+--------------------------------------------------------
Percent Agreement | -0.2122 0.0398 -5.33 0.002 -0.3096 -0.1149
Brennan and Prediger | -0.3184 0.0597 -5.33 0.002 -0.4644 -0.1723
Cohen/Conger's Kappa | 0.0518 0.0691 0.75 0.482 -0.1173 0.2210
Scott/Fleiss' Kappa | 0.0476 0.0664 0.72 0.500 -0.1148 0.2100
Gwet's AC | -0.4180 0.0627 -6.67 0.001 -0.5715 -0.2646
Krippendorff's Alpha | 0.0475 0.0631 0.75 0.480 -0.1069 0.2018
------------------------------------------------------------------------------
By the way, with 7 raters and 3 rating categories, the lower limit for observed agreement is 0.24 (code).

Comment