Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two way anova with repeated measures versus 2x2 mixed anova

    Hi,

    I have a study where participants either have a disease or not, variable Study_Set, and underwent an intervention in which blood pressure, variable BP, was measured before and after the intervention, categorized in the variable time. The participant ID's are coded in the variable ID.

    I have tried a few different things and not sure which is right

    Code:
    anova BP time ID Study_Set, repeated(ID Study_Set)
    Code:
    anova BP time ID Study_Set time#Study_Set, repeated(ID Study_Set)
    Code:
    mixed BP time Study_Set time#Study_Set


    Appreciative of any insights.

  • #2
    Code:
    anova BP Study_Set / ID time Study_Set#time
    Study_Set is a patient characteristic (within-subjects factor).

    Also, with two levels of the repeated-measures factor, you don't need the repeated() option.

    Also consider
    Code:
    mixed BP i.Study_Set##i.time || ID: , reml dfmethod(kroger)
    or some variation on that.

    Comment


    • #3
      Originally posted by Joseph Coveney View Post
      Study_Set is a patient characteristic (within-subjects factor).
      Ignore that precaffeinated first attempt and make it "between-subjects factor".

      Comment


      • #4
        Hi Joseph,

        Thanks for responding so quickly!

        Not sure I fully understand the code you suggested for the -anova-. Why exclude time from the model and only include it in the error?

        In the -mixed- code, I am surprised that adding the "i." to the categorical terms makes a difference. They have only two levels. Not sure I understand why telling Stata to treat them as categorical terms makes a difference. But I tested out both ways and it does.

        What does analyzing the degrees of freedom at the end do for us?

        Thanks very much!

        Comment


        • #5
          Originally posted by Jay Gold View Post
          Why exclude time from the model and only include it in the error?
          It doesn't exclude time from the model nor does it include it in the error. The two error terms are the residual and the random effect of participant, not time. Actually, for the latter, the subject × group interaction term is typically specified, that is,
          Code:
          anova BP Study_Set / ID|Study_Set time Study_Set#time
          or, equivalently,
          Code:
          anova BP Study_Set / ID#Study_Set time Study_Set#time
          In the -mixed- code, I am surprised that adding the "i." to the categorical terms makes a difference. They have only two levels. Not sure I understand why telling Stata to treat them as categorical terms makes a difference. But I tested out both ways and it does.
          For interaction terms, Stata defaults to categorical (i.) interpretation, and so it really shouldn't have made any difference. (I included the factor variable notation in my illustration only for explicitness.)

          You have something that's not right going on with your (unseen) dataset. I recommend that you look into that.

          What does analyzing the degrees of freedom at the end do for us?
          It doesn't. That option is to allow for small-sample adjustment of the test statistics and their degrees of freedom.

          Comment


          • #6
            I double checked my data. I do not see any issues regarding categorization, but clearly I am missing something. I think the reason why I was getting different results when I told Stata to use i. before the categorical variables is because I was including the categorical variable in the model and not just the interaction term.


            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float Study_Set byte time double BP int ID
            0 1  1.61805733 115
            0 2 1.467541895 115
            0 1 1.516576806 141
            0 2 1.126462383 141
            0 1 1.952253752 146
            0 2 1.870135503 146
            1 1 1.205615904 101
            1 2 1.067952954 101
            0 1 2.079262965 102
            0 2 1.746883206 102
            0 1 1.447766111 103
            0 2 1.373130808 103
            1 1 1.737879439 105
            1 2 1.822874183 105
            0 1 1.667118379 106
            0 2 1.647797812 106
            1 1  1.55078952 107
            1 2 1.362589464 107
            0 1 1.070905035 108
            0 2 1.456091623 108
            0 1 1.332632484 109
            0 2 1.058665857 109
            0 1 1.079205046 114
            0 2  1.23712726 114
            1 1 1.573984351 115
            1 2 1.497110278 115
            1 1 1.941952163 116
            1 2 1.836467432 116
            1 1  1.90517393 118
            1 2 2.377863238 118
            1 1 2.549291332 120
            1 2 2.599906847 120
            0 1 1.737123976 124
            0 2 1.381154024 124
            1 1 1.737041992 128
            1 2 1.787149271 128
            0 1 1.006887782 129
            0 2 1.048089542 129
            0 1 1.097199223 135
            0 2 1.677033107 135
            end

            Apologies, but I am even more confused by your response regarding the -anova- code in #5. Should not time be included in the model beyond just an error term? I.e. time is held "constant" while examining the effect of Study_Set on BP? Also should not Study_Set#time be included in the model and not just an error term?

            Overall, not sure which is more accurate here: to use -mixed- or -anova-.

            Thanks very much, Joseph.


            Comment


            • #7
              Originally posted by Jay Gold View Post
              I double checked my data. I do not see any issues regarding categorization, but clearly I am missing something.
              Just a couple of anomalies pop out in a quick scan: (i) the study participant who's numbered 115 is shown as both having the disease and not having the disease, and (ii) the scale of measurement for blood pressure is strange.

              I think the reason why I was getting different results when I told Stata to use i. before the categorical variables is because I was including the categorical variable in the model and not just the interaction term.
              That could account for it.

              Should not time be included in the model beyond just an error term? I.e. time is held "constant" while examining the effect of Study_Set on BP? Also should not Study_Set#time be included in the model and not just an error term?
              Again, they aren't: neither time nor the disease status × time interaction is an error term in the model.

              If you look at the ANOVA table, you see that both of the terms have test statistics. The two error terms, subject-within-disease group and residual are the only terms in the ANOVA table for which test statistics (and associated p-values) are not reported.

              I'm not sure why you think that the ANOVA model treats time and disease status × time interaction as error terms.

              Overall, not sure which is more accurate here: to use -mixed- or -anova-.
              Both give identical results here.

              Comment


              • #8
                Where do you see that participant ID 115 is categorized as having the disease and not? The variable is Study_Set and the participant with ID 115 has a value of "0" when time is "1" or "2" (the first two observations).

                According to the help file for -anova- the terms following the "/" are the error terms (see attached screenshot). So here, there are 3 error terms:
                1. ID#Study_Set 2. time 3. Study_Set#time
                Again, they aren't: neither time nor the disease status × time interaction is an error term in the model.
                It looks like they are, if they are after the "/". But my experience is limited and clearly I am misunderstanding something fundamental.
                Both give identical results here.
                That does not seem to be true unfortunately. See below. Very different results.

                Code:
                anova BP Study_Set / ID#Study_Set time Study_Set#time
                
                                         Number of obs =         40    R-squared     =  0.8960
                                         Root MSE      =    .189533    Adj R-squared =  0.7747
                
                                  Source | Partial SS         df         MS        F    Prob>F
                          ---------------+----------------------------------------------------
                                   Model |  5.5711845         21    .2652945      7.39  0.0000
                                         |
                               Study_Set |  1.1030685          1   1.1030685      4.46  0.0490
                            ID#Study_Set |  4.4556606         18    .2475367  
                          ---------------+----------------------------------------------------
                                    time |  .00139779          1   .00139779      0.04  0.8458
                          Study_Set#time |  .00913045          1   .00913045      0.25  0.6203
                                         |
                                Residual |  .64661052         18   .03592281  
                          ---------------+----------------------------------------------------
                                   Total |   6.217795         39   .15943064  
                
                . mixed BP i.Study_Set##i.time || ID: , reml dfmethod(kroger)
                
                Performing EM optimization ...
                
                Performing gradient-based optimization:
                Iteration 0:  Log restricted-likelihood =  -13.54979  
                Iteration 1:  Log restricted-likelihood =  -13.54979  
                
                Computing standard errors ...
                
                Computing degrees of freedom ...
                
                Mixed-effects REML regression                        Number of obs    =     40
                Group variable: ID                                   Number of groups =     19
                                                                     Obs per group:
                                                                                  min =      2
                                                                                  avg =    2.1
                                                                                  max =      4
                DF method: Kenward–Roger                             DF:          min =  18.31
                                                                                  avg =  25.07
                                                                                  max =  35.78
                                                                     F(3, 23.92)      =   0.87
                Log restricted-likelihood =  -13.54979               Prob > F         = 0.4706
                
                --------------------------------------------------------------------------------
                            BP | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                ---------------+----------------------------------------------------------------
                     Study_Set |
                           HV  |   .1762969   .1502266     1.17   0.248    -.1284413     .481035
                               |
                          time |
                     Handgrip  |  -.0429063   .0790283    -0.54   0.594    -.2087384    .1229257
                               |
                Study_Set#time |
                  HV#Handgrip  |   .0616795   .1249547     0.49   0.627    -.2005241     .323883
                               |
                         _cons |   1.522852   .1063539    14.32   0.000     1.304953    1.740752
                --------------------------------------------------------------------------------
                
                ------------------------------------------------------------------------------
                  Random-effects parameters  |   Estimate   Std. err.     [95% conf. interval]
                -----------------------------+------------------------------------------------
                ID: Identity                 |
                                  var(_cons) |   .1116705   .0461102      .0497126    .2508479
                -----------------------------+------------------------------------------------
                               var(Residual) |   .0374728   .0126659      .0193201    .0726816
                ------------------------------------------------------------------------------
                LR test vs. linear model: chibar2(01) = 13.85         Prob >= chibar2 = 0.0001
                
                .
                end of do-file
                Code:
                
                
                Attached Files
                Last edited by Jay Gold; 30 Aug 2024, 09:49. Reason: Edited for formatting.

                Comment


                • #9
                  Code:
                  list if ID==115
                  gives me the following:
                  Code:
                       +-----------------------------------+
                       | Study_~t   time          BP    ID |
                       |-----------------------------------|
                    1. |        0      1   1.6180573   115 |
                    2. |        0      2   1.4675419   115 |
                   25. |        1      1   1.5739844   115 |
                   26. |        1      2   1.4971103   115 |
                       +-----------------------------------+
                  The anova help file you pasted says that the term following the / is the error term. In the code, the term following / is ID:
                  Code:
                  anova BP Study_Set / ID time Study_Set#time
                  And as Joseph said, time and the Study_Set#time get not just mean squares estimates, but also F-test statistics and associated p-values. The true error terms, ID and Residual, have no F-test statistic or p-value.

                  Regarding mixed vs. anova results. They are very similar. I'm assuming the one difference you are talking about is the p-value on Study_Set. Honestly, the I personally trust the mixed results more because of the small sample size correction employed. You can explore all the results further using the following post-estimation tools:
                  Code:
                   contrast Study_Set##time, small
                  
                  margins Study_Set
                  margins time
                  
                  margins Study_Set#time0
                  marginsplot, xdimension(time0)

                  Comment


                  • #10
                    Thanks very much Erik!!

                    I see the issue with ID 115. I was shortening a string ID to a number only ID by truncating the string and was left with two subjects both as "115". Easily remedied. I suppose using -dup- would have helped find these.

                    I was taking the -anova- help file too literally and interpreting it as all terms following the "/" were the error terms and not just the immediate term following the "/'.

                    Thanks to you again as well Joseph for your patience!

                    Comment


                    • #11
                      Sorry! One final question.

                      I notice the model gives different results when coded as you suggested:
                      Code:
                      anova BP Study_Set / ID#Study_Set time Study_Set#time
                      vs when run with terms before accounting for the error term first in the model:

                      Code:
                      anova BP time Study_Set#time Study_Set / ID#Study_Set
                      Is that because with the second option the random error from different participants (the ID variable) is being accounted for in the two groups before running the rest of the model?

                      Comment


                      • #12
                        Originally posted by Jay Gold View Post
                        That does not seem to be true unfortunately. See below. Very different results.
                        No, they're identical. You need to follow-up mixed with contrast to get the same contrasts that ANOVA uses. See below. (Complete do-file and log file attached for your convenience.)

                        .ÿ
                        .ÿversionÿ18.0

                        .ÿ
                        .ÿclearÿ*

                        .ÿ
                        .ÿquietlyÿinputÿfloatÿStudy_SetÿbyteÿtimeÿdoubleÿBPÿintÿID

                        .ÿ
                        .ÿ//ÿFirst,ÿlet'sÿcorrectÿIDÿ115
                        .ÿreplaceÿIDÿ=ÿIDÿ+ÿ1000ÿ*ÿStudy_SetÿifÿIDÿ==ÿ115
                        (2ÿrealÿchangesÿmade)

                        .ÿ
                        .ÿ//ÿandÿmakeÿtheÿvariableÿnamesÿofÿuniformÿlengthÿandÿcase
                        .ÿrenameÿ(Study_SetÿtimeÿBPÿID)ÿ(grpÿtimÿoutÿpid)

                        .ÿ
                        .ÿ//ÿNow,ÿcompareÿresultsÿofÿ-anova-ÿandÿ-mixed-
                        .ÿanovaÿoutÿgrpÿ/ÿpid|grpÿtimÿgrp#tim

                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿ=ÿÿÿÿÿÿÿÿÿ40ÿÿÿÿR-squaredÿÿÿÿÿ=ÿÿ0.8960
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿRootÿMSEÿÿÿÿÿÿ=ÿÿÿÿ.189533ÿÿÿÿAdjÿR-squaredÿ=ÿÿ0.7747

                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿSourceÿ|ÿPartialÿSSÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿÿMSÿÿÿÿÿÿÿÿFÿÿÿÿProb>F
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-----------+----------------------------------------------------
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿModelÿ|ÿÿ5.5711845ÿÿÿÿÿÿÿÿÿ21ÿÿÿÿ.2652945ÿÿÿÿÿÿ7.39ÿÿ0.0000
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrpÿ|ÿÿ1.1030685ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿ1.1030685ÿÿÿÿÿÿ4.46ÿÿ0.0490
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿpid|grpÿ|ÿÿ4.4556606ÿÿÿÿÿÿÿÿÿ18ÿÿÿÿ.2475367ÿÿ
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-----------+----------------------------------------------------
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtimÿ|ÿÿ.00139779ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿ.00139779ÿÿÿÿÿÿ0.04ÿÿ0.8458
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgrp#timÿ|ÿÿ.00913045ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿ.00913045ÿÿÿÿÿÿ0.25ÿÿ0.6203
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿResidualÿ|ÿÿ.64661052ÿÿÿÿÿÿÿÿÿ18ÿÿÿ.03592281ÿÿ
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-----------+----------------------------------------------------
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿTotalÿ|ÿÿÿ6.217795ÿÿÿÿÿÿÿÿÿ39ÿÿÿ.15943064ÿÿ

                        .ÿ
                        .ÿquietlyÿmixedÿoutÿi.grp##i.timÿ||ÿpid:ÿ,ÿremlÿdfmethod(kroger)

                        .ÿcontrastÿgrpÿtimÿgrp#tim,ÿsmallÿ//ÿ<=ÿidenticalÿtoÿ-anova-

                        Contrastsÿofÿmarginalÿlinearÿpredictions

                        Margins:ÿasbalanced

                        -----------------------------------------------------------
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿdfÿÿÿÿÿÿÿÿddfÿÿÿÿÿÿÿÿÿÿÿFÿÿÿÿÿÿÿÿP>F
                        -------------+---------------------------------------------
                        outÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿgrpÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿ18.00ÿÿÿÿÿÿÿÿ4.46ÿÿÿÿÿ0.0490
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿÿÿÿÿtimÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿ18.00ÿÿÿÿÿÿÿÿ0.04ÿÿÿÿÿ0.8458
                        ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
                        ÿÿÿÿÿgrp#timÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿÿÿÿÿ18.00ÿÿÿÿÿÿÿÿ0.25ÿÿÿÿÿ0.6203
                        -----------------------------------------------------------

                        .ÿ
                        .ÿexit

                        endÿofÿdo-file


                        .


                        Identical degrees of freedom, test statistics and p-values for all three terms.

                        (Not shown above for brevity, but the residual variance is identical, too. And if you want to compute the random effect variance from the mean squares, that, too, will match.)


                        Originally posted by Jay Gold View Post
                        Sorry! One final question. . . .Is that because with the second option the random error from different participants (the ID variable) is being accounted for in the two groups before running the rest of the model?
                        Something like that, but more directly, the subjects-within-groups error term is the incorrect error term for tests of the repeated measure, time, and interactions involving it. The residual error term is the correct error term here.
                        Attached Files

                        Comment

                        Working...
                        X