Analysis of repeated measures with STATA

Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#1

Analysis of repeated measures with STATA

03 Feb 2019, 06:05

Dear All,

Please, I would like to gently ask you all two questions about the analysis of repeated measures on STATA 14.0

1# Is it possible to linear regression instead of Anova repeated measures ?==> regress calories phase instead of anova calories phase subject , repeated (phase)
if not why ?

I have tested both method and I have obtained a results completely different:

Result from regression

------------------------------------------------------------------------------
calories | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
phase | 5.550875 47.43901 0.12 0.907 -87.91544 99.01719
_cons | 1944.094 102.48 18.97 0.000 1742.183 2146.004
------------------------------------------------------------------------------

AND result from anova

Number of obs = 234 R-squared = 0.9876
Root MSE = .044943 Adj R-squared = 0.9812

Source | Partial SS df MS F Prob>F
-----------+----------------------------------------------------
Model | 24.775995 79 .31362019 155.27 0.0000
|
phase | .53173289 2 .26586644 131.63 0.0000
subject | 24.244262 77 .31486054 155.88 0.0000
|
Residual | .31105595 154 .00201984
-----------+----------------------------------------------------
Total | 25.087051 233 .10766975

Between-subjects error term: subject
Levels: 78 (77 df)
Lowest b.s.e. variable: subject

Repeated variable: phase
Huynh-Feldt epsilon = 0.9420
Greenhouse-Geisser epsilon = 0.9205
Box's conservative epsilon = 0.5000

------------ Prob > F ------------
Source | df F Regular H-F G-G Box
-----------+----------------------------------------------------
phase | 2 131.63 0.0000 0.0000 0.0000 0.0000
Residual | 154
----------------------------------------------------------------

2# If data is non-normal distributed, Is it possible to run log-transformation and then use Anova test ?

Thank you for your help.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

03 Feb 2019, 08:10

Radhouene:
if your dependent variables is continuous and, as it seems, you're dealing with a longitudinal study, you may want to consider -xtreg-.
Normal distribution is required for residuals, not for regressand or predictors.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#3

03 Feb 2019, 09:52

Thank you Carlo for your answer. However, I would like to know if it is possible to use the anova repeated measures in this case ?
I measure the "calories" intake among the same person during 3 check point of intervention (variable : phase).

Another question is possible to use simple linear regression as I wrote previously (command:regress calories phase) ?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

03 Feb 2019, 10:04

Another question is possible to use simple linear regression as I wrote previously (command:regress calories phase) ?

No. -regress- requires that the observations be independently sampled, which is definitely not the case with repeated measures on the same people. You must use -xtreg- or -mixed- to analyze this kind of data. (Or repeated measures ANOVA.)
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#5

03 Feb 2019, 10:04

Radhouene:
I'm not an expert of repeated experiments with ANOVA; hence, I would refer to the Example #15, -anova- entry, Stata .pdf manual.
As far as OLS is concerned, I would try something along the following lines:

Code:

regress calories i.pahse, vce(cluster personid)

If you use default standard errors, Stata treats your observations as being independent, whereas you actually have a panel data structure (the same person is measured repeatedly on calories intake, if I got the your experiment design right): hence, you need clustered standard errors to perform a pooled OLS.
However, if you want to go OLS, with panel data it is rare (altough possible) that pooled OLS outperforms -xtreg-.

PS: crossed in the cyberspace with Clyde's helpful reply, that wisely includes the -mixed- option.

Last edited by Carlo Lazzaro; 03 Feb 2019, 10:06.

Kind regards,
Carlo
(Stata 19.0)
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#6

03 Feb 2019, 16:09

Originally posted by Radhouene DOGGUI View Post

1# Is it possible to linear regression instead of Anova repeated measures ?==> regress calories phase instead of anova calories phase subject , repeated (phase)

You can use regress instead of anova to fit the model:

Code:

anova calories phase subject

becomes

Code:

regress calories i.(phase subject) testparm i.phase testparm i.subject

and you will get the same results.

But you'll need to compute the Greenhouse-Geisser and Huynh-Feldt epsilons by yourself; there's no repeated() option for regress to do that part.

2# If data is non-normal distributed, Is it possible to run log-transformation and then use Anova test ?

Yes, but look into

Code:

meglm calories i.phase || subject: , family(gaussian) link(log)

as an alternative. It has advantages in intepretability over transformation and a linear model. You've got nearly 80 participants, which might come near enough to asymptotic for iterative maximum likelihood methods.

Last edited by Joseph Coveney; 03 Feb 2019, 16:20. Reason: -regress- needed response variable and I needed coffee
2 likes
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#7

04 Feb 2019, 03:47

Thank you very much for all.

Please I have another question.

I code calories variables using a cut-off values (calories_c2 was coded 0, 1)

it is possible to use this command to assess the association of caloric intake and the different phases ? "logistic calories_c2 i.phase" or "logistic calories_c2 i.(phase subject)"

Best regards.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#8

04 Feb 2019, 03:59

I code calories variables using a cut-off values (calories_c2 was coded 0, 1) it is possible to use this command to assess the association of caloric intake and the different phases ?

Google harrell dichotomization for a general answer.

"logistic calories_c2 i.(phase subject)"

T = 3; Google incidental variables for the answer to that specifically.
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#9

04 Feb 2019, 04:29

Radhouene:
as an aside to Joseph's helpful reply, see, in addition to Frank Harrel's note,
http://citeseerx.ist.psu.edu/viewdoc...=rep1&type=pdf

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#10

04 Feb 2019, 09:18

Thank you for all.
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#11

06 Feb 2019, 11:25

Originally posted by Clyde Schechter View Post

No. -regress- requires that the observations be independently sampled, which is definitely not the case with repeated measures on the same people. You must use -xtreg- or -mixed- to analyze this kind of data. (Or repeated measures ANOVA.)

1) Please could you help me to write the correct command by using xtreg: I have "calories" (continuous variable) as dependent variable, I would like to elaborate a regression model to evaluate calories intake fluctuation over the different phases (n=3) taking as reference the first one (ib1.phase). and then determine the something after adjusting to subject body mass index (bmi) and level of physical activity (PA==>categorical: 1, 2 and 3)

2) my second question: is it possible to categorize "calories" in binary variable and use the xtgee command ? like that
xtgee calories ib1.phase calories bmi ib3.PA, family(binomial) link(logit)

Thank you so much.

Regards,

Last edited by Radhouene DOGGUI; 06 Feb 2019, 11:28.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#12

06 Feb 2019, 11:43

Radhouene:
1) you may want to try something along the following lines:

Code:

xtset subject phase xtreg calories ib1 i.phase i.PA bmi

-xtreg- requires choosing between -fe- and (the default) -re- specification via -hausman- test.
2) as per # 8 and 9, I would not sponsor an approach aimed at categorizing a continuous dependent variable.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#13

06 Feb 2019, 11:49

I completely agree with Carlo's response.
Comment
Radhouene DOGGUI

Join Date: Jun 2018

Posts: 72
#14

06 Feb 2019, 11:58

Originally posted by Carlo Lazzaro View Post

Radhouene:
1) you may want to try something along the following lines:

Code:

xtset subject phase xtreg calories ib1 i.phase i.PA bmi

-xtreg- requires choosing between -fe- and (the default) -re- specification via -hausman- test.
2) as per # 8 and 9, I would not sponsor an approach aimed at categorizing a continuous dependent variable.

1) Thank you very muck. I am not sure what is best fe or re, because normally calories intake will decrease only during the 2nd phase but I am not sure that the effect will be similarly for each subject.==> So, I think its better to random effect

2) I tend to categorize because I have several other biological markers with censored observations (below the detection limit). The frequency of censored observations is ranging between 20%-80% form variable to another. So, in order to homogenize all variables analysis, I tend to categorize using the detection limit as cut-off values for biological markers and median for calories. What do you think ?

Regards,
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#15

06 Feb 2019, 12:12

I tend to categorize because I have several other biological markers with censored observations (below the detection limit). The frequency of censored observations is ranging between 20%-80% form variable to another. So, in order to homogenize all variables analysis, I tend to categorize using the detection limit as cut-off values for biological markers and median for calories. What do you think ?

My take on this is that it is bad enough that you have these censored observations to start with. It may be that categorizing them using the detection limit as cutoff is the best you can make of a bad situation. But nothing requires you to do this for the calories variable, and doing so just takes a bad situation and makes it even worse. There is no value in "homogenizing" the variables in this way. It gains you nothing and it throws away useful information.
Comment

Announcement

Analysis of repeated measures with STATA

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment