kappaetc: Testing for signficance of a difference in before / after intervention

daniel klein

Join Date: Mar 2014
Posts: 3860

#16

15 Jun 2020, 00:07

Thanks for providing the data example; this makes things easier. Here is a step-by-step suggestion

Code:

// step 1: -encode- string variables to get numeric variables

    /*
        we first define a value label with all rating categories
        we do this to ensure identical coding of all variables
    */
label define depth 1 "Floating" 2 "Neither" 3 "Deep"

    /*
        now we -encode- using our value label
        we will replace the original variables
    */
forvalues i = 1/7 {
    encode depth`i' , generate(numeric_depth`i') label(depth)
    drop depth`i'
    rename numeric_depth`i' depth`i'
}

    /*
        now -encode- the before vs. after indicator
        
        the label ensures that "before" is coded as 1
         and "after" is coded 2; this is more intutive
    */
label define reading 1 "before" 2 "after"
encode reading , generate(numeric_reading) label(reading)
drop reading
rename numeric_reading reading


// step 2: reshape
    
    /*
        we add an underscore to the variables
        depth1_1 will be easier on the eye than depth11
    */
rename depth* depth*_
reshape wide depth1_-depth7_ , i(subject) j(reading)

// step 3: implement the bootstrap program
capture program drop kappaetc_bs
program kappaetc_bs , rclass
    tempname before after diff
    kappaetc depth1_1-depth7_1
    matrix `before' = r(b)
    kappaetc depth1_2-depth7_2
    matrix `after' = r(b)
    matrix `diff' = `before' - `after'
    return matrix diff = `diff'
end

// step 4: get results
set seed 42 // <- to replicate results
bootstrap                        ///
    delta_pa = el(r(diff), 1, 1) ///
    delta_bp = el(r(diff), 1, 2) ///
    delta_ck = el(r(diff), 1, 3) ///
    delta_fk = el(r(diff), 1, 4) ///
    delta_ac = el(r(diff), 1, 5) ///
    delta_ka = el(r(diff), 1, 6) ///
    , reps(500) : kappaetc_bs

estat bootstrap , all

The code yields (I am using Stata 12.1 in this example)

Code:

(output omitted)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    delta_pa |  -.2122449   .0347841    -6.10   0.000    -.2804204   -.1440694
    delta_bp |  -.3183673   .0520363    -6.12   0.000    -.4203566   -.2163781
    delta_ck |   .0518425   .0641114     0.81   0.419    -.0738136    .1774985
    delta_fk |   .0476113   .0608716     0.78   0.434    -.0716949    .1669176
    delta_ac |  -.4180127   .0543014    -7.70   0.000    -.5244414    -.311584
    delta_ka |   .0474747   .0582826     0.81   0.415    -.0667571    .1617066
------------------------------------------------------------------------------

(output omitted)

Compare these results with kappaetc's t-test implementation:

Code:

Paired t tests of agreement coefficients         Number of subjects =       7
                                                  Differences (before)-(after)
------------------------------------------------------------------------------
                     |   Diff.  Std. Err.    t    P>|t|   [95% Conf. Interval]
---------------------+--------------------------------------------------------
   Percent Agreement | -0.2122    0.0398  -5.33   0.002    -0.3096    -0.1149
Brennan and Prediger | -0.3184    0.0597  -5.33   0.002    -0.4644    -0.1723
Cohen/Conger's Kappa |  0.0518    0.0691   0.75   0.482    -0.1173     0.2210
 Scott/Fleiss' Kappa |  0.0476    0.0664   0.72   0.500    -0.1148     0.2100
           Gwet's AC | -0.4180    0.0627  -6.67   0.001    -0.5715    -0.2646
Krippendorff's Alpha |  0.0475    0.0631   0.75   0.480    -0.1069     0.2018
------------------------------------------------------------------------------

By the way, with 7 raters and 3 rating categories, the lower limit for observed agreement is 0.24 (code).

Last edited by daniel klein; 15 Jun 2020, 00:09.

Comment

Michael McCulloch

Join Date: Jul 2025

Posts: 24
#17

15 Jun 2020, 08:21

Thank you Daniel. Comparing the standard errors resulting from the -bootstrap- vs. -kappaetc ttest- methods, I see now your earlier point about -kappaetc ttest- being based on large sample approximation. There were other assessments in this study, and I'll be able to implement your helpful code for those as well. Best wishes, Michael
Comment

Announcement

Comment

Comment