Differences between "icc" and "estat icc" estimates?

salomee hamel

Join Date: Dec 2020

Posts: 3
#1

Differences between "icc" and "estat icc" estimates?

14 Dec 2020, 03:35

Dear stata users

I am trying to estimate the intra-class correlation of measures performed by several judges. I assume that such judges are a random effect covariate. My measures are recorded in the "rating" variable, and the target expected is recorded in the "target"
My first idea is to use the "icc" command. My second to perform a mixed model, and to use the "estat icc" post-estimation command. These two different commands give quite different results, I am not sure to understand why?

Code:

webuse judges icc rating target judge <results> -------------------------------------------------------------- rating | ICC [95% Conf. Interval] -----------------------+-------------------------------------- Individual | .2897638 .0187865 .7610844 Average | .6200505 .0711368 .927232 -------------------------------------------------------------- mixed rating i.target || judge: estat icc <results> ------------------------------------------------------------------------------ Level | ICC Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ judge | .8372506 .1089676 .5176289 .9610324 ------------------------------------------------------------------------------

Using the ICC commande, I will conclude to a poor agreement between judges (ICC = 0.28), while using the post-estimate estat icc, I would conclude to a strong agreement?

I may do a big mistake somewhere, but I am not sure to understand where... Thanks so much for your help!

Salome
Tags: None
salomee hamel

Join Date: Dec 2020

Posts: 3
#2

15 Dec 2020, 08:54

Well, probably a mistake... I forgot how was estimated an ICC, through an ANOVA... with ANOVA, the target is not a fixed effect, but a random effect! obviously, to be able to estimate the variance associated with target...

So,

Code:

mixed rating || target: || judge: , reml estat icc

gives as results :

Code:

Level | ICC Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ target | .1657267 .2232479 .0083184 .8246961 judge|target | .8988753 37.99775 . 1

The question is still the same, I still do not understand why different estimates of ICC trough mixed or ICC? (0.16 is definitively not equal to 0.29, but both correspond to low agreement)
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2400

15 Dec 2020, 09:42

It is possible that your -mixed- model in #2 is mis-specified to your design. You do not say whether the same subjects were rated by the same judges, or as this model implies, subjects are rated by different judges.

If your design is a mix of judges all rating (at least some of) the same subjects, then the model needs to account for a crossed effect of subject and judge, rather than a hierarchical design. A crossed-effect model is analogous to the random effect of subject, judge and subject-by-judge interaction. You can code this is several equivalent ways, but assuming there are more subjects than judges, you could write it as:

Code:

mixed rating || _all : R.judge || target : || judge : , reml

However, you then need to compute the ICC manually from the variance components (which isn't so difficult, but less convenient than -estat icc-).

This small of a dataset would not lead to good (or possibly stable) estimates of the variance components, but I can show the idea

Code:

webuse judges
icc rating target judge
mixed rating || _all : R.judge || target : || judge :, var reml
di 2.556 / (2.556 + 5.244 + 0.739 + .280)

Result:

Code:

. icc rating target judge

Intraclass correlations
Two-way random-effects model
Absolute agreement

Random effects: target           Number of targets =         6
Random effects: judge            Number of raters  =         4

--------------------------------------------------------------
                rating |        ICC       [95% Conf. Interval]
-----------------------+--------------------------------------
            Individual |   .2897638       .0187865    .7610844
               Average |   .6200505       .0711368     .927232
--------------------------------------------------------------
F test that
  ICC=0.00: F(5.0, 15.0) = 11.03              Prob > F = 0.000


. mixed rating || _all : R.judge || target : || judge :, var reml
[... output omitted ...]
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
_all: Identity               |
                var(R.judge) |   5.244418   4.421203      1.004867     27.3707
-----------------------------+------------------------------------------------
target: Identity             |
                  var(_cons) |   2.555539   1.779883      .6525849    10.00755
-----------------------------+------------------------------------------------
judge: Identity              |
                  var(_cons) |   .7392813   95.42312      1.0e-110    5.5e+109
-----------------------------+------------------------------------------------
               var(Residual) |   .2801661   95.42278      3.4e-291    2.3e+289
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 23.04                 Prob > chi2 = 0.0000

. di 2.555539 / (2.555539 + 5.244418 + 0.7392813 + 0.2801661)
.28976322

These results agree to within 5 decimal places which is good enough. In this specification, the variance components for subject, judge, subject-by-judge interaction and error correspond to -var(target)-, -var(R.judge)-, -var(judge) and -var(Residual)- respectively.

Comment

daniel klein

Join Date: Mar 2014

Posts: 3844
#4

15 Dec 2020, 11:17

Code:

mixed rating || _all : R.judge || target : || judge :, var reml noconst

will give you the same result up to 10 digits. I believe that estimating subject-by-judge interaction requires that judges rate subjects repeatedly.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#5

15 Dec 2020, 11:30

daniel klein , the mixed model in #3 is based on the two-way random effects ANOVA which specifies the overall mean, so I don't think it's quite the same if the constant is dropped. Here I use the subject-by-judge interaction assuming that all subjects are rated by all judges (a fully factorial design). Repeated measures by the same judge on the same subject could be added as another level of hierarchy for replication.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3844
#6

15 Dec 2020, 12:41

The Method and formulas section of icc states that

With one observation per target and rater, \(\sigma^2_{rc}\) and \(\sigma^2_{\epsilon}\) cannot be estimated separately.

The noconstant option in #4 is equivalent to

Code:

mixed rating || _all : R.judge || target : , var reml

which, as I understand it (I could be wrong), suppresses the subject-by-rater interaction (i.e., blends the subject-by-judge interaction with the error variance). Apparently, that affects the reml estimates of all variance components. As reported, the results match icc up to 10 digits.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2400
#7

15 Dec 2020, 12:59

Thanks for clarifying that for me, daniel. You are right about the reduction to the blended error terms. I had also misread at which level the noconstant option was applied.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3844
#8

15 Dec 2020, 13:06

Leonardo, you basically clarified the issue yourself in #5. In a full factorial design, there is no error variance because the model fits the data perfectly (R-squared will be 1).
Comment
salomee hamel

Join Date: Dec 2020

Posts: 3
#9

16 Dec 2020, 04:30

Thank you very much for your time and your replies! (and forgive my probably poor english, even if I try to do my best to be understood...)

To respond to Leonardo (#3), you are right, in this example, I considered that several (different) judges rated the same subjects, and indeed, you are (still) right, I was not convinced by the hierarchical structure of my model... (how to say that subjects are nested in judges, or the opposite?) I didn't know that I could consider a crossed-effect, as you both proposed, thanks for that!

Can I allow myself to ask for more precisions about the way to code that? I am not sure to fully understand the " || _all : R.judge " part of the code. Is it a way to say that the judge effect is not nested in the target effect? (but, if that is correct, should I specify that the target effect is no more nested in the judge effect, by specifying something like

Code:

mixed rating || _all : R.judge || _all : R.target , var reml

and in this case, writing the "judge" effect before or after the "target" effect should be obviously identical, which is the case)

And, more generally, if I study repeated data over time (multiple measures for each individual), should I specify that these data are not nested? (in other words, in the next example, is the first model better that the second? not from a result point of view - these are the same - but for a more "pedagogical" point of view, for making very clear that no effect is considered as being nested in an other one?)

Code:

use http://www.stata-press.com/data/r13/pig <model 1> mixed weight week || _all: R.id <model 2> mixed weight week || id:

Thanks again for your help, truly!

Salome
Comment

Announcement