Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Analysis of repeated measures with STATA

    Dear All,

    Please, I would like to gently ask you all two questions about the analysis of repeated measures on STATA 14.0

    1# Is it possible to linear regression instead of Anova repeated measures ?==> regress calories phase instead of anova calories phase subject , repeated (phase)
    if not why ?

    I have tested both method and I have obtained a results completely different:

    Result from regression


    ------------------------------------------------------------------------------
    calories | Coef. Std. Err. t P>|t| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    phase | 5.550875 47.43901 0.12 0.907 -87.91544 99.01719
    _cons | 1944.094 102.48 18.97 0.000 1742.183 2146.004
    ------------------------------------------------------------------------------

    AND result from anova


    Number of obs = 234 R-squared = 0.9876
    Root MSE = .044943 Adj R-squared = 0.9812

    Source | Partial SS df MS F Prob>F
    -----------+----------------------------------------------------
    Model | 24.775995 79 .31362019 155.27 0.0000
    |
    phase | .53173289 2 .26586644 131.63 0.0000
    subject | 24.244262 77 .31486054 155.88 0.0000
    |
    Residual | .31105595 154 .00201984
    -----------+----------------------------------------------------
    Total | 25.087051 233 .10766975


    Between-subjects error term: subject
    Levels: 78 (77 df)
    Lowest b.s.e. variable: subject

    Repeated variable: phase
    Huynh-Feldt epsilon = 0.9420
    Greenhouse-Geisser epsilon = 0.9205
    Box's conservative epsilon = 0.5000

    ------------ Prob > F ------------
    Source | df F Regular H-F G-G Box
    -----------+----------------------------------------------------
    phase | 2 131.63 0.0000 0.0000 0.0000 0.0000
    Residual | 154
    ----------------------------------------------------------------

    2# If data is non-normal distributed, Is it possible to run log-transformation and then use Anova test ?


    Thank you for your help.

  • #2
    Radhouene:
    if your dependent variables is continuous and, as it seems, you're dealing with a longitudinal study, you may want to consider -xtreg-.
    Normal distribution is required for residuals, not for regressand or predictors.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you Carlo for your answer. However, I would like to know if it is possible to use the anova repeated measures in this case ?
      I measure the "calories" intake among the same person during 3 check point of intervention (variable : phase).

      Another question is possible to use simple linear regression as I wrote previously (command:regress calories phase) ?

      Comment


      • #4
        Another question is possible to use simple linear regression as I wrote previously (command:regress calories phase) ?
        No. -regress- requires that the observations be independently sampled, which is definitely not the case with repeated measures on the same people. You must use -xtreg- or -mixed- to analyze this kind of data. (Or repeated measures ANOVA.)

        Comment


        • #5
          Radhouene:
          I'm not an expert of repeated experiments with ANOVA; hence, I would refer to the Example #15, -anova- entry, Stata .pdf manual.
          As far as OLS is concerned, I would try something along the following lines:
          Code:
          regress calories i.pahse, vce(cluster personid)
          If you use default standard errors, Stata treats your observations as being independent, whereas you actually have a panel data structure (the same person is measured repeatedly on calories intake, if I got the your experiment design right): hence, you need clustered standard errors to perform a pooled OLS.
          However, if you want to go OLS, with panel data it is rare (altough possible) that pooled OLS outperforms -xtreg-.

          PS: crossed in the cyberspace with Clyde's helpful reply, that wisely includes the -mixed- option.
          Last edited by Carlo Lazzaro; 03 Feb 2019, 10:06.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Originally posted by Radhouene DOGGUI View Post
            1# Is it possible to linear regression instead of Anova repeated measures ?==> regress calories phase instead of anova calories phase subject , repeated (phase)
            You can use regress instead of anova to fit the model:
            Code:
            anova calories phase subject
            becomes
            Code:
            regress calories i.(phase subject)
            testparm i.phase
            testparm i.subject
            and you will get the same results.

            But you'll need to compute the Greenhouse-Geisser and Huynh-Feldt epsilons by yourself; there's no repeated() option for regress to do that part.

            2# If data is non-normal distributed, Is it possible to run log-transformation and then use Anova test ?
            Yes, but look into
            Code:
            meglm calories i.phase || subject: , family(gaussian) link(log)
            as an alternative. It has advantages in intepretability over transformation and a linear model. You've got nearly 80 participants, which might come near enough to asymptotic for iterative maximum likelihood methods.
            Last edited by Joseph Coveney; 03 Feb 2019, 16:20. Reason: -regress- needed response variable and I needed coffee

            Comment


            • #7
              Thank you very much for all.

              Please I have another question.

              I code calories variables using a cut-off values (calories_c2 was coded 0, 1)

              it is possible to use this command to assess the association of caloric intake and the different phases ? "logistic calories_c2 i.phase" or "logistic calories_c2 i.(phase subject)"

              Best regards.

              Comment


              • #8
                I code calories variables using a cut-off values (calories_c2 was coded 0, 1) it is possible to use this command to assess the association of caloric intake and the different phases ?
                Google harrell dichotomization for a general answer.

                "logistic calories_c2 i.(phase subject)"
                T = 3; Google incidental variables for the answer to that specifically.

                Comment


                • #9
                  Radhouene:
                  as an aside to Joseph's helpful reply, see, in addition to Frank Harrel's note,
                  http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Thank you for all.

                    Comment


                    • #11
                      Originally posted by Clyde Schechter View Post
                      No. -regress- requires that the observations be independently sampled, which is definitely not the case with repeated measures on the same people. You must use -xtreg- or -mixed- to analyze this kind of data. (Or repeated measures ANOVA.)
                      1) Please could you help me to write the correct command by using xtreg: I have "calories" (continuous variable) as dependent variable, I would like to elaborate a regression model to evaluate calories intake fluctuation over the different phases (n=3) taking as reference the first one (ib1.phase). and then determine the something after adjusting to subject body mass index (bmi) and level of physical activity (PA==>categorical: 1, 2 and 3)

                      2) my second question: is it possible to categorize "calories" in binary variable and use the xtgee command ? like that
                      xtgee calories ib1.phase calories bmi ib3.PA, family(binomial) link(logit)

                      Thank you so much.

                      Regards,
                      Last edited by Radhouene DOGGUI; 06 Feb 2019, 11:28.

                      Comment


                      • #12
                        Radhouene:
                        1) you may want to try something along the following lines:
                        Code:
                        xtset subject phase
                        xtreg calories ib1 i.phase i.PA bmi
                        -xtreg- requires choosing between -fe- and (the default) -re- specification via -hausman- test.
                        2) as per # 8 and 9, I would not sponsor an approach aimed at categorizing a continuous dependent variable.
                        Kind regards,
                        Carlo
                        (Stata 19.0)

                        Comment


                        • #13
                          I completely agree with Carlo's response.

                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            Radhouene:
                            1) you may want to try something along the following lines:
                            Code:
                            xtset subject phase
                            xtreg calories ib1 i.phase i.PA bmi
                            -xtreg- requires choosing between -fe- and (the default) -re- specification via -hausman- test.
                            2) as per # 8 and 9, I would not sponsor an approach aimed at categorizing a continuous dependent variable.
                            1) Thank you very muck. I am not sure what is best fe or re, because normally calories intake will decrease only during the 2nd phase but I am not sure that the effect will be similarly for each subject.==> So, I think its better to random effect

                            2) I tend to categorize because I have several other biological markers with censored observations (below the detection limit). The frequency of censored observations is ranging between 20%-80% form variable to another. So, in order to homogenize all variables analysis, I tend to categorize using the detection limit as cut-off values for biological markers and median for calories. What do you think ?

                            Regards,

                            Comment


                            • #15
                              I tend to categorize because I have several other biological markers with censored observations (below the detection limit). The frequency of censored observations is ranging between 20%-80% form variable to another. So, in order to homogenize all variables analysis, I tend to categorize using the detection limit as cut-off values for biological markers and median for calories. What do you think ?
                              My take on this is that it is bad enough that you have these censored observations to start with. It may be that categorizing them using the detection limit as cutoff is the best you can make of a bad situation. But nothing requires you to do this for the calories variable, and doing so just takes a bad situation and makes it even worse. There is no value in "homogenizing" the variables in this way. It gains you nothing and it throws away useful information.

                              Comment

                              Working...
                              X