test-retest reliability - what effects are fixed?

Lana Chisholm

Join Date: May 2024

Posts: 2
#1

test-retest reliability - what effects are fixed?

12 May 2025, 22:13

Hi there,
I am checking that I have correctly specified the variables for test-retest reliability.

I have participants (ID) who took a questionnaire with three survey instruments on two seperate occassions. I'm looking to compare the score agreement across time between the three instruments.

Here's my data in long format

input int ID byte(time can) float(pwi who)
211 1 7 45 12
211 2 8 48 9
212 1 7 52 18
212 2 8 56 20
213 1 7 52 17
213 2 8 57 16
214 1 7 49 8
214 2 7 52 5
215 1 7 56 19
215 2 . . .
216 1 9 59 22
216 2 8 62 20
217 1 6 46 16
217 2 7 48 13
218 1 . . .
218 2 9 67 21
219 1 7 50 10
219 2 6 57 10
220 1 7 56 20
220 2 7 56 17
222 1 7 59 8
222 2 8 58 12
223 1 . . .
223 2 9 61 24
224 1 7 45 11
224 2 . . .
225 1 7 58 16
225 2 7 55 15
226 1 8 55 16
226 2 8 58 12
227 1 6 54 20
227 2 7 58 22
228 1 5 44 13
228 2 8 47 15
229 1 7 49 13
229 2 7 51 15
230 1 9 60 18
230 2 9 61 20
231 1 8 65 20
231 2 9 58 .
232 1 8 55 17
232 2 7 60 19
233 1 . . .
233 2 7 52 15
234 1 6 41 14
234 2 5 44 13
235 1 5 46 13
235 2 6 49 13
236 1 6 50 11
236 2 7 48 12
237 1 7 55 19
237 2 7 58 18
238 1 8 57 19
238 2 9 60 18
239 1 7 55 17
239 2 6 53 16
240 1 9 67 21
240 2 10 58 25
241 1 7 48 16
241 2 8 51 17
242 1 8 66 23
242 2 9 64 22
243 1 . . .
243 2 . 57 17
244 1 6 44 14
244 2 7 52 17
245 1 7 53 13
245 2 6 47 14
end
[/CODE]

To test reliability of one survey instrument, can, I ran:

Code:

icc can ID time, mixed absolute

(5 targets omitted from computation because not rated by all raters)

But this model is estimating timepoint as the fixed effect, rather than the participants retaking the same survey.

output:

Code:

Intraclass correlations Two-way mixed-effects model Absolute agreement Random effects: ID Number of targets = 28 Fixed effects: time Number of raters = 2 -------------------------------------------------------------- can | ICC [95% conf. interval] -----------------------+-------------------------------------- Individual | .5894074 .2834621 .786455 Average | .7416694 .4417148 .8804644 -------------------------------------------------------------- F test that ICC=0.00: F(27.0, 27.0) = 4.25 Prob > F = 0.000

Q1: in the output, does Individual and Average refer to the correlation between timepoints within an individual and between individuals within timepoints, respectively?
Q2: is there a way to specify the fixed effects for ID using icc?

Because I've seen it recommended, I've also used -kappaetc-, icc

input int ID byte(can1 can2) float(pwi1 pwi2 who1 who2)
237 7 7 55 58 19 18
230 9 9 60 61 18 20
224 7 . 45 . 11 .
216 9 8 59 62 22 20
245 7 6 53 47 13 14
211 7 8 45 48 12 9
239 7 6 55 53 17 16
234 6 5 41 44 14 13
238 8 9 57 60 19 18
241 7 8 48 51 16 17
236 6 7 50 48 11 12
232 8 7 55 60 17 19
242 8 9 66 64 23 22
219 7 6 50 57 10 10
240 9 10 67 58 21 25
212 7 8 52 56 18 20
215 7 . 56 . 19 .
231 8 9 65 58 20 .
227 6 7 54 58 20 22
213 7 8 52 57 17 16
225 7 7 58 55 16 15
233 . 7 . 52 . 15
244 6 7 44 52 14 17
214 7 7 49 52 8 5
218 . 9 . 67 . 21
223 . 9 . 61 . 24
222 7 8 59 58 8 12
229 7 7 49 51 13 15
243 . . . 57 . 17
228 5 8 44 47 13 15
217 6 7 46 48 16 13
235 5 6 46 49 13 13
226 8 8 55 58 16 12
220 7 7 56 56 20 17
end
[/CODE]

here I can see the interrater reliability - ICC(3,1)- is very similar to what the icc command found above.

. kappaetc can1 can2 , icc(mixed) listwise

Interrater reliability Number of subjects = 28
Two-way mixed-effects model Ratings per subject = 2
------------------------------------------------------------------------------
| Coef. F df1 df2 P>F [95% Conf. Interval]
---------------+--------------------------------------------------------------
ICC(3,1) | 0.6193 4.25 27.00 27.00 0.000 0.3262 0.8037
---------------+--------------------------------------------------------------
sigma_s | 0.8622
sigma_e | 0.6760
------------------------------------------------------------------------------

Q3: Is ID the fixed effect in the kappaetc , icc(mixed) estimate by default here?

Q4: I would like to report both the individual and average ICC coefficients (assuming I've interpreted these correctly above)- is it possible to see determine the group average ICC using kappaetc (similar to the icc command output)?
Tags: None

Erik Ruzek

Join Date: Oct 2017
Posts: 442

13 May 2025, 11:27

Hi Lana,

I think it might be easier for you to get what you want by using the xtreg and mixed commands. You have a situation where individuals are rating themselves through their responses to a set of items. We don't see those but you provide us the sum in your example data. In a case such as this, the ID variable captures both the target and rater. So the best model to capture test-retest reliability here is the one-way random effects model with a random intercept for ID. There are no "fixed effects" in this model. From the random effect variances, we can calculate the ICC(1,1), which is a measure of test-retest reliability.

Code:

mixed can || ID: , reml    // use REML estimation because of small sample size
estat icc    // ICC(1,1)

Intraclass correlation

------------------------------------------------------------------------------
                       Level |        ICC   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
                          ID |   .5786739   .1248276      .3348918    .7893159
------------------------------------------------------------------------------

You asked about treating ID as a fixed effect instead. You can do that by using xtreg and specifying a fixed effects model. The rho parameter in the output, below, is the analogue to ICC(1,1) in the fixed effect model:

Code:

xtreg can, fe i(ID)

Fixed-effects (within) regression               Number of obs     =         61
Group variable: ID                              Number of groups  =         33

R-squared:                                      Obs per group:
     Within  =      .                                         min =          1
                                                              avg =        1.8
                                                              max =          2

                                                F(0, 28)          =       0.00
corr(u_i, Xb) =      .                          Prob > F          =          .

------------------------------------------------------------------------------
         can | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |   7.311475   .0921383    79.35   0.000     7.122739    7.500212
-------------+----------------------------------------------------------------
     sigma_u |  1.0037807
     sigma_e |  .71962292
         rho |  .66051791   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(32, 28) = 3.54                      Prob > F = 0.0005

Note that you do not get a confidence interval for rho whereas you do with estat icc after mixed. In terms of the individual and average ICC coefficients, I don't see the need for anything but the individual ICC(1,1) in your case. The code for obtaining the ICC(1,1) in kappaetc is the following:

Code:

* First reshape the data to wide
reshape wide can pwi who, i(ID) j(time)

kappaetc can1 can2 , icc(oneway)

Interrater reliability                           Number of subjects =      33
One-way random-effects model               Ratings per subject: min =       1
                                                                avg =  1.8485
                                                                max =       2
------------------------------------------------------------------------------
               |   Coef.     F     df1     df2      P>F   [95% Conf. Interval]
---------------+--------------------------------------------------------------
      ICC(1,1) |  0.5789   3.54    32.00   28.00   0.001    0.2708     0.7726
---------------+--------------------------------------------------------------
       sigma_s |  0.8438
       sigma_e |  0.7196
------------------------------------------------------------------------------
Note: F test and confidence intervals are based on methods for complete data.

Last edited by Erik Ruzek; 13 May 2025, 11:30. Reason: Clarified fixed vs random effect.

Comment

Lana Chisholm

Join Date: May 2024

Posts: 2
#3

15 May 2025, 17:56

Thanks Erik for your thorough reply! I was reading more about the two-way and one-way models - Koo et al (2016) has been recommended - they say it's important to use 2-way models for test-retest, and the fixed effects when a sample was not randomly selected. That said, the mixed/fixed effects model equations they present are identical. Really appreciate the clarification and output interpretation you provided!
1 like
Comment

Announcement