Two way anova with repeated measures versus 2x2 mixed anova

Jay Gold

Join Date: Jul 2023

Posts: 72
#1

Two way anova with repeated measures versus 2x2 mixed anova

28 Aug 2024, 15:28

Hi,

I have a study where participants either have a disease or not, variable Study_Set, and underwent an intervention in which blood pressure, variable BP, was measured before and after the intervention, categorized in the variable time. The participant ID's are coded in the variable ID.

I have tried a few different things and not sure which is right

Code:

anova BP time ID Study_Set, repeated(ID Study_Set)

Code:

anova BP time ID Study_Set time#Study_Set, repeated(ID Study_Set)

Code:

mixed BP time Study_Set time#Study_Set

Appreciative of any insights.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4399
#2

28 Aug 2024, 17:18

Code:

anova BP Study_Set / ID time Study_Set#time

Study_Set is a patient characteristic (within-subjects factor).

Also, with two levels of the repeated-measures factor, you don't need the repeated() option.

Also consider

Code:

mixed BP i.Study_Set##i.time || ID: , reml dfmethod(kroger)

or some variation on that.
2 likes
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4399
#3

28 Aug 2024, 18:44

Originally posted by Joseph Coveney View Post

Study_Set is a patient characteristic (within-subjects factor).

Ignore that precaffeinated first attempt and make it "between-subjects factor".
1 like
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#4

28 Aug 2024, 19:36

Hi Joseph,

Thanks for responding so quickly!

Not sure I fully understand the code you suggested for the -anova-. Why exclude time from the model and only include it in the error?

In the -mixed- code, I am surprised that adding the "i." to the categorical terms makes a difference. They have only two levels. Not sure I understand why telling Stata to treat them as categorical terms makes a difference. But I tested out both ways and it does.

What does analyzing the degrees of freedom at the end do for us?

Thanks very much!
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4399
#5

28 Aug 2024, 20:45

Originally posted by Jay Gold View Post

Why exclude time from the model and only include it in the error?

It doesn't exclude time from the model nor does it include it in the error. The two error terms are the residual and the random effect of participant, not time. Actually, for the latter, the subject × group interaction term is typically specified, that is,

Code:

anova BP Study_Set / ID|Study_Set time Study_Set#time

or, equivalently,

Code:

anova BP Study_Set / ID#Study_Set time Study_Set#time

In the -mixed- code, I am surprised that adding the "i." to the categorical terms makes a difference. They have only two levels. Not sure I understand why telling Stata to treat them as categorical terms makes a difference. But I tested out both ways and it does.

For interaction terms, Stata defaults to categorical (i.) interpretation, and so it really shouldn't have made any difference. (I included the factor variable notation in my illustration only for explicitness.)

You have something that's not right going on with your (unseen) dataset. I recommend that you look into that.

What does analyzing the degrees of freedom at the end do for us?

It doesn't. That option is to allow for small-sample adjustment of the test statistics and their degrees of freedom.
Comment

Jay Gold

Join Date: Jul 2023
Posts: 72

29 Aug 2024, 09:12

I double checked my data. I do not see any issues regarding categorization, but clearly I am missing something. I think the reason why I was getting different results when I told Stata to use i. before the categorical variables is because I was including the categorical variable in the model and not just the interaction term.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float Study_Set byte time double BP int ID
0 1  1.61805733 115
0 2 1.467541895 115
0 1 1.516576806 141
0 2 1.126462383 141
0 1 1.952253752 146
0 2 1.870135503 146
1 1 1.205615904 101
1 2 1.067952954 101
0 1 2.079262965 102
0 2 1.746883206 102
0 1 1.447766111 103
0 2 1.373130808 103
1 1 1.737879439 105
1 2 1.822874183 105
0 1 1.667118379 106
0 2 1.647797812 106
1 1  1.55078952 107
1 2 1.362589464 107
0 1 1.070905035 108
0 2 1.456091623 108
0 1 1.332632484 109
0 2 1.058665857 109
0 1 1.079205046 114
0 2  1.23712726 114
1 1 1.573984351 115
1 2 1.497110278 115
1 1 1.941952163 116
1 2 1.836467432 116
1 1  1.90517393 118
1 2 2.377863238 118
1 1 2.549291332 120
1 2 2.599906847 120
0 1 1.737123976 124
0 2 1.381154024 124
1 1 1.737041992 128
1 2 1.787149271 128
0 1 1.006887782 129
0 2 1.048089542 129
0 1 1.097199223 135
0 2 1.677033107 135
end

Apologies, but I am even more confused by your response regarding the -anova- code in #5. Should not time be included in the model beyond just an error term? I.e. time is held "constant" while examining the effect of Study_Set on BP? Also should not Study_Set#time be included in the model and not just an error term?

Overall, not sure which is more accurate here: to use -mixed- or -anova-.

Thanks very much, Joseph.

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4399
#7

30 Aug 2024, 00:08

Originally posted by Jay Gold View Post

I double checked my data. I do not see any issues regarding categorization, but clearly I am missing something.

Just a couple of anomalies pop out in a quick scan: (i) the study participant who's numbered 115 is shown as both having the disease and not having the disease, and (ii) the scale of measurement for blood pressure is strange.

I think the reason why I was getting different results when I told Stata to use i. before the categorical variables is because I was including the categorical variable in the model and not just the interaction term.

That could account for it.

Should not time be included in the model beyond just an error term? I.e. time is held "constant" while examining the effect of Study_Set on BP? Also should not Study_Set#time be included in the model and not just an error term?

Again, they aren't: neither time nor the disease status × time interaction is an error term in the model.

If you look at the ANOVA table, you see that both of the terms have test statistics. The two error terms, subject-within-disease group and residual are the only terms in the ANOVA table for which test statistics (and associated p-values) are not reported.

I'm not sure why you think that the ANOVA model treats time and disease status × time interaction as error terms.

Overall, not sure which is more accurate here: to use -mixed- or -anova-.

Both give identical results here.
1 like
Comment

Jay Gold

Join Date: Jul 2023
Posts: 72

30 Aug 2024, 09:45

Where do you see that participant ID 115 is categorized as having the disease and not? The variable is Study_Set and the participant with ID 115 has a value of "0" when time is "1" or "2" (the first two observations).

According to the help file for -anova- the terms following the "/" are the error terms (see attached screenshot). So here, there are 3 error terms:
1. ID#Study_Set 2. time 3. Study_Set#time

Again, they aren't: neither time nor the disease status × time interaction is an error term in the model.

It looks like they are, if they are after the "/". But my experience is limited and clearly I am misunderstanding something fundamental.

Both give identical results here.

That does not seem to be true unfortunately. See below. Very different results.

Code:

anova BP Study_Set / ID#Study_Set time Study_Set#time

                         Number of obs =         40    R-squared     =  0.8960
                         Root MSE      =    .189533    Adj R-squared =  0.7747

                  Source | Partial SS         df         MS        F    Prob>F
          ---------------+----------------------------------------------------
                   Model |  5.5711845         21    .2652945      7.39  0.0000
                         |
               Study_Set |  1.1030685          1   1.1030685      4.46  0.0490
            ID#Study_Set |  4.4556606         18    .2475367  
          ---------------+----------------------------------------------------
                    time |  .00139779          1   .00139779      0.04  0.8458
          Study_Set#time |  .00913045          1   .00913045      0.25  0.6203
                         |
                Residual |  .64661052         18   .03592281  
          ---------------+----------------------------------------------------
                   Total |   6.217795         39   .15943064  

. mixed BP i.Study_Set##i.time || ID: , reml dfmethod(kroger)

Performing EM optimization ...

Performing gradient-based optimization:
Iteration 0:  Log restricted-likelihood =  -13.54979  
Iteration 1:  Log restricted-likelihood =  -13.54979  

Computing standard errors ...

Computing degrees of freedom ...

Mixed-effects REML regression                        Number of obs    =     40
Group variable: ID                                   Number of groups =     19
                                                     Obs per group:
                                                                  min =      2
                                                                  avg =    2.1
                                                                  max =      4
DF method: Kenward–Roger                             DF:          min =  18.31
                                                                  avg =  25.07
                                                                  max =  35.78
                                                     F(3, 23.92)      =   0.87
Log restricted-likelihood =  -13.54979               Prob > F         = 0.4706

--------------------------------------------------------------------------------
            BP | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
---------------+----------------------------------------------------------------
     Study_Set |
           HV  |   .1762969   .1502266     1.17   0.248    -.1284413     .481035
               |
          time |
     Handgrip  |  -.0429063   .0790283    -0.54   0.594    -.2087384    .1229257
               |
Study_Set#time |
  HV#Handgrip  |   .0616795   .1249547     0.49   0.627    -.2005241     .323883
               |
         _cons |   1.522852   .1063539    14.32   0.000     1.304953    1.740752
--------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
-----------------------------+------------------------------------------------
ID: Identity                 |
                  var(_cons) |   .1116705   .0461102      .0497126    .2508479
-----------------------------+------------------------------------------------
               var(Residual) |   .0374728   .0126659      .0193201    .0726816
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 13.85         Prob >= chibar2 = 0.0001

.
end of do-file

Code:

Attached Files

Last edited by Jay Gold; 30 Aug 2024, 09:49. Reason: Edited for formatting.

Comment

Erik Ruzek

Join Date: Oct 2017

Posts: 423
#9

30 Aug 2024, 10:20

Code:

list if ID==115

gives me the following:

Code:

+-----------------------------------+ | Study_~t time BP ID | |-----------------------------------| 1. | 0 1 1.6180573 115 | 2. | 0 2 1.4675419 115 | 25. | 1 1 1.5739844 115 | 26. | 1 2 1.4971103 115 | +-----------------------------------+

The anova help file you pasted says that the term following the / is the error term. In the code, the term following / is ID:

Code:

anova BP Study_Set / ID time Study_Set#time

And as Joseph said, time and the Study_Set#time get not just mean squares estimates, but also F-test statistics and associated p-values. The true error terms, ID and Residual, have no F-test statistic or p-value.

Regarding mixed vs. anova results. They are very similar. I'm assuming the one difference you are talking about is the p-value on Study_Set. Honestly, the I personally trust the mixed results more because of the small sample size correction employed. You can explore all the results further using the following post-estimation tools:

Code:

contrast Study_Set##time, small margins Study_Set margins time margins Study_Set#time0 marginsplot, xdimension(time0)
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#10

30 Aug 2024, 11:49

Thanks very much Erik!!

I see the issue with ID 115. I was shortening a string ID to a number only ID by truncating the string and was left with two subjects both as "115". Easily remedied. I suppose using -dup- would have helped find these.

I was taking the -anova- help file too literally and interpreting it as all terms following the "/" were the error terms and not just the immediate term following the "/'.

Thanks to you again as well Joseph for your patience!
Comment
Jay Gold

Join Date: Jul 2023

Posts: 72
#11

30 Aug 2024, 13:11

Sorry! One final question.

I notice the model gives different results when coded as you suggested:

Code:

anova BP Study_Set / ID#Study_Set time Study_Set#time

vs when run with terms before accounting for the error term first in the model:

Code:

anova BP time Study_Set#time Study_Set / ID#Study_Set

Is that because with the second option the random error from different participants (the ID variable) is being accounted for in the two groups before running the rest of the model?
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4399
#12

30 Aug 2024, 20:21

Originally posted by Jay Gold View Post

That does not seem to be true unfortunately. See below. Very different results.

No, they're identical. You need to follow-up mixed with contrast to get the same contrasts that ANOVA uses. See below. (Complete do-file and log file attached for your convenience.)

.ÿ
.ÿversionÿ18.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿquietlyÿinputÿfloatÿStudy_SetÿbyteÿtimeÿdoubleÿBPÿintÿID

.ÿ
.ÿ//ÿFirst,ÿlet'sÿcorrectÿIDÿ115
.ÿreplaceÿIDÿ=ÿIDÿ+ÿ1000ÿ*ÿStudy_SetÿifÿIDÿ==ÿ115
(2ÿrealÿchangesÿmade)

.ÿ
.ÿ//ÿandÿmakeÿtheÿvariableÿnamesÿofÿuniformÿlengthÿandÿcase
.ÿrenameÿ(Study_SetÿtimeÿBPÿID)ÿ(grpÿtimÿoutÿpid)

.ÿ
.ÿ//ÿNow,ÿcompareÿresultsÿofÿ-anova-ÿandÿ-mixed-
.ÿanovaÿoutÿgrpÿ/ÿpid|grpÿtimÿgrp#tim

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿÿÿÿÿÿÿÿÿ40ÿÿÿÿR-squaredÿÿÿÿÿ=ÿÿ0.8960
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRootÿMSEÿÿÿÿÿÿ=ÿÿÿÿ.189533ÿÿÿÿAdjÿR-squaredÿ=ÿÿ0.7747

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSourceÿ|ÿPartialÿSSÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿÿMSÿÿÿÿÿÿÿÿFÿÿÿÿProb>F
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-----------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿModelÿ|ÿÿ5.5711845ÿÿÿÿÿÿÿÿÿ21ÿÿÿÿ.2652945ÿÿÿÿÿÿ7.39ÿÿ0.0000
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrpÿ|ÿÿ1.1030685ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿ1.1030685ÿÿÿÿÿÿ4.46ÿÿ0.0490
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿpid|grpÿ|ÿÿ4.4556606ÿÿÿÿÿÿÿÿÿ18ÿÿÿÿ.2475367ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-----------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtimÿ|ÿÿ.00139779ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿ.00139779ÿÿÿÿÿÿ0.04ÿÿ0.8458
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrp#timÿ|ÿÿ.00913045ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿ.00913045ÿÿÿÿÿÿ0.25ÿÿ0.6203
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿResidualÿ|ÿÿ.64661052ÿÿÿÿÿÿÿÿÿ18ÿÿÿ.03592281ÿÿ
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-----------+----------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿTotalÿ|ÿÿÿ6.217795ÿÿÿÿÿÿÿÿÿ39ÿÿÿ.15943064ÿÿ

.ÿ
.ÿquietlyÿmixedÿoutÿi.grp##i.timÿ||ÿpid:ÿ,ÿremlÿdfmethod(kroger)

.ÿcontrastÿgrpÿtimÿgrp#tim,ÿsmallÿ//ÿ<=ÿidenticalÿtoÿ-anova-

Contrastsÿofÿmarginalÿlinearÿpredictions

Margins:ÿasbalanced

-----------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿddfÿÿÿÿÿÿÿÿÿÿÿFÿÿÿÿÿÿÿÿP>F
-------------+---------------------------------------------
outÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿgrpÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿ18.00ÿÿÿÿÿÿÿÿ4.46ÿÿÿÿÿ0.0490
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿtimÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿ18.00ÿÿÿÿÿÿÿÿ0.04ÿÿÿÿÿ0.8458
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿgrp#timÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿ18.00ÿÿÿÿÿÿÿÿ0.25ÿÿÿÿÿ0.6203
-----------------------------------------------------------

.ÿ
.ÿexit

endÿofÿdo-file

.

Identical degrees of freedom, test statistics and p-values for all three terms.

(Not shown above for brevity, but the residual variance is identical, too. And if you want to compute the random effect variance from the mean squares, that, too, will match.)

Originally posted by Jay Gold View Post

Sorry! One final question. . . .Is that because with the second option the random error from different participants (the ID variable) is being accounted for in the two groups before running the rest of the model?

Something like that, but more directly, the subjects-within-groups error term is the incorrect error term for tests of the repeated measure, time, and interactions involving it. The residual error term is the correct error term here.
Attached Files

ANOVA vs mixed.do (1.4 KB, 1 view)

ANOVA vs mixed.smcl (2.7 KB, 1 view)
2 likes
Comment

Announcement