Predictors in longitudinal mixed models - two ways to specify model

Karen Gould

Join Date: Aug 2016

Posts: 12
#1

Predictors in longitudinal mixed models - two ways to specify model

08 Sep 2016, 21:04

Hi all,

I am building a mixed model (longitudinal, data in long format, clustered by ID with a random slope for time) and am testing time-invariant predictors of change in the outcome variable over time as interactions of predictorXtime. In this model, the significance of the interaction indicates whether the predictor significantly predicts change in outcome over time. The significance of the time parameter indicates whether the outcome changed over time.

In Stata, something like this (MODEL 1):

Code:

mixed outcomevar time predictor predictor#time || id: time

However, I have seen other models testing the same thing in a different way - by dropping the first timepoint from the dataset, and including the initial value of the outcome variable as a covariate, like this (MODEL 2):

Code:

drop if time==0 mixed outcomevar time initialvalueoutcome predictor || id: time

In this model, the significance of the predictor parameter indicates whether the predictor significantly predicts change in outcome over time.

I am wondering if someone could explain the different implications of these two ways of modeling predictors? Why would one be used over the other? Can the second model be used to test the simple effects of time, or would the time parameter in this model refer to something different to what I described in the first model?

Thanks in advance for your time.
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4457
#2

09 Sep 2016, 01:13

I don't quite follow you when you say that in the first model "the interaction indicates whether the predictor significantly predicts change in outcome over time".

Anyway, I don't think that the two models are equivalent. The two models have different likelihood values,and they give rise to different predictions. See below. For example, the second model doesn't model a predictor × time interaction and such a phenomenon isn't evident in its predictions (red lines) as it is in the first model's predictions (blue lines).

.ÿversionÿ14.2

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿsetÿseedÿ156034

.ÿ
.ÿquietlyÿdrawnormÿicpÿslp,ÿdoubleÿcorr(1ÿ-0.5ÿ\ÿ-0.5ÿ1)ÿn(200)

.ÿgenerateÿintÿpidÿ=ÿ_n

.ÿgenerateÿdoubleÿgrpÿ=ÿmod(_n,ÿ2)

.ÿforvaluesÿtimeÿ=ÿ0/2ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿlocalÿtimÿ=ÿ`time'ÿ-ÿ1
ÿÿ3.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿout`time'ÿ=ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-10ÿ+ÿicpÿ+ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0.1ÿ*ÿgrpÿ+ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ-0.1ÿ*ÿ`tim'ÿ+ÿslpÿ+ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ0.5ÿ*ÿgrpÿ*ÿ`tim'ÿ+ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿrnormal()
ÿÿ4.ÿ}

.ÿquietlyÿreshapeÿlongÿout,ÿi(pid)ÿj(tim)

.ÿ
.ÿmixedÿoutÿi.grp##c.timÿ||ÿpid:ÿtim,ÿcovariance(unstructured)ÿmlÿnolrtestÿnostderrÿnolog

Mixed-effectsÿMLÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ600
Groupÿvariable:ÿpidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ200

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ3
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ3.0
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ3

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(3)ÿÿÿÿÿÿ=ÿÿÿÿÿÿ47.65
Logÿlikelihoodÿ=ÿ-973.36835ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿoutÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿ1.grpÿ|ÿÿ-.3752969ÿÿÿ.1895449ÿÿÿÿ-1.98ÿÿÿ0.048ÿÿÿÿ-.7467981ÿÿÿ-.0037956
ÿÿÿÿÿÿÿÿÿtimÿ|ÿÿ-.1039489ÿÿÿ.0701269ÿÿÿÿ-1.48ÿÿÿ0.138ÿÿÿÿ-.2413951ÿÿÿÿ.0334974
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿgrp#c.timÿ|
ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿÿ.5621158ÿÿÿ.0991745ÿÿÿÿÿ5.67ÿÿÿ0.000ÿÿÿÿÿ.3677375ÿÿÿÿ.7564942
ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-9.968013ÿÿÿ.1340285ÿÿÿ-74.37ÿÿÿ0.000ÿÿÿÿÿ-10.2307ÿÿÿ-9.705322
------------------------------------------------------------------------------

------------------------------------------------------------------------------
ÿÿRandom-effectsÿParametersÿÿ|ÿÿÿEstimateÿÿÿStd.ÿErr.ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-----------------------------+------------------------------------------------
pid:ÿUnstructuredÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿvar(tim)ÿ|ÿÿÿ.0041912ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿvar(_cons)ÿ|ÿÿÿ.9837186ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿcov(tim,_cons)ÿ|ÿÿ-.0642103ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
-----------------------------+------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿvar(Residual)ÿ|ÿÿÿ.9751749ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
------------------------------------------------------------------------------

.ÿpredictÿdoubleÿiat,ÿxb

.ÿ
.ÿbysortÿpidÿ(tim):ÿgenerateÿdoubleÿout0ÿ=ÿout[1]

.ÿ
.ÿmixedÿoutÿi.grpÿc.timÿc.out0ÿifÿtimÿ!=ÿ0ÿ||ÿpid:ÿtim,ÿcovariance(unstructured)ÿmlÿnolrtestÿnostderrÿnolog

Mixed-effectsÿMLÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿ400
Groupÿvariable:ÿpidÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿgroupsÿÿ=ÿÿÿÿÿÿÿÿ200

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿObsÿperÿgroup:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿminÿ=ÿÿÿÿÿÿÿÿÿÿ2
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿavgÿ=ÿÿÿÿÿÿÿÿ2.0
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmaxÿ=ÿÿÿÿÿÿÿÿÿÿ2

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(3)ÿÿÿÿÿÿ=ÿÿÿÿÿÿ91.66
Logÿlikelihoodÿ=ÿ-619.95697ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿoutÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿ1.grpÿ|ÿÿÿ.6539924ÿÿÿ.1383117ÿÿÿÿÿ4.73ÿÿÿ0.000ÿÿÿÿÿ.3829066ÿÿÿÿ.9250783
ÿÿÿÿÿÿÿÿÿtimÿ|ÿÿÿÿ.159846ÿÿÿ.0947097ÿÿÿÿÿ1.69ÿÿÿ0.091ÿÿÿÿ-.0257815ÿÿÿÿ.3454735
ÿÿÿÿÿÿÿÿout0ÿ|ÿÿÿ.4152359ÿÿÿÿ.047812ÿÿÿÿÿ8.68ÿÿÿ0.000ÿÿÿÿÿÿ.321526ÿÿÿÿ.5089458
ÿÿÿÿÿÿÿ_consÿ|ÿÿ-6.234501ÿÿÿ.5089243ÿÿÿ-12.25ÿÿÿ0.000ÿÿÿÿ-7.231974ÿÿÿ-5.237028
------------------------------------------------------------------------------

------------------------------------------------------------------------------
ÿÿRandom-effectsÿParametersÿÿ|ÿÿÿEstimateÿÿÿStd.ÿErr.ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-----------------------------+------------------------------------------------
pid:ÿUnstructuredÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿvar(tim)ÿ|ÿÿÿ.6497454ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿvar(_cons)ÿ|ÿÿÿ2.477243ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿcov(tim,_cons)ÿ|ÿÿ-1.092119ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
-----------------------------+------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿvar(Residual)ÿ|ÿÿÿ.5721191ÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿÿÿ.ÿÿÿÿÿÿÿÿÿÿÿ.
------------------------------------------------------------------------------

.ÿpredictÿdoubleÿbas,ÿxb

.ÿ
.ÿassertÿgrpÿifÿpidÿ==ÿ1

.ÿassertÿ!grpÿifÿpidÿ==ÿ2

.ÿ
.ÿgraphÿtwowayÿ///
>ÿÿÿÿÿÿÿÿÿlineÿiatÿtimÿifÿpidÿ==ÿ1,ÿlcolor(blue)ÿlpattern(solid)ÿ||ÿ///
>ÿÿÿÿÿÿÿÿÿlineÿiatÿtimÿifÿpidÿ==ÿ2,ÿlcolor(blue)ÿlpattern(dash)ÿ||ÿ///
>ÿÿÿÿÿÿÿÿÿlineÿbasÿtimÿifÿpidÿ==ÿ1,ÿlcolor(red)ÿlpattern(solid)ÿ||ÿ///
>ÿÿÿÿÿÿÿÿÿlineÿbasÿtimÿifÿpidÿ==ÿ2,ÿlcolor(red)ÿlpattern(dash)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿxtitle(Time)ÿxlabel(0(1)2)ÿytitle(Outcome)ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿylabel(ÿ,ÿangle(horizontal)ÿnogrid)ÿlegend(off)

.ÿquietlyÿgraphÿexportÿGraph.png,ÿreplace

.ÿ
.ÿexit

endÿofÿdo-file

.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#3

09 Sep 2016, 09:35

I'm going to somewhat disagree with Joseph Coveney here.

First, Karen is describing the models incorrectly. Let's simpify somewhat by eliminating the random slopes, and I"m also going to assume here that predictor is a dichotomous variable, and that time is also (say, for example, that predictor indicates membership in one arm of a randomized trial and time indicates post-intervention vs baseline.)

Code:

mixed outcome i.predictor##i.time || id: // MODEL 1

The other model is:

Code:

by id (time), sort: gen base_outcome = outcome[1] regress outcome i.predictor c.base_outcome if time != 0 // MODEL 2

If we translate MODEL 1 into statistics and take the normality assumptions about random effects seriously, it looks like this:

Code:

outcome_ij = constant + b*predictor_i + u_i + e_ij, where i = 1,..., N indexes units of analysis and j = 0, 1 indexes time. u_i ~ N(0, V_u), e_ij ~ N(0, V_e), where N denotes a normal distribution and V_u and V_e are the variances. Moreover the u_i and e_ij are all independent.

With this statistical model, if you plug in the formulas for a normal distribution and do some algebra on it, you can calculate that:

Code:

outcome_i1 ~ N(r*(outcome_i0) + (1-r)*(constant+b*predictor), rV_e), where r = V_u/(V_u + V_e) is the intraclass correlation.

(This is a calculation that everyone who uses these models should do once, and only once, in a lifetime.) Putting that as a regression equation it translates to:

Code:

outcome_i1 = (1-r)*constant +(1- r)*b*Predictor_i + r*outcome_i0 + error

which can be simplified to:

Code:

outcome_i1 = different_constant + different_b*Predictor + r*outcome_i0 + error

which would be implemented in Stata as

Code:

regress outcome_i1 predictor outcome_i0 // i.e. MODEL 2!

These models are therefore algebraically equivalent. And in fact, the coefficient of Predictor you get from Model 2 is the coefficient from Model 1 attenuated by one minus the intraclass correlation. The incorporation of random slopes, or the presence of multiple levels of either predictor of time will complicate the calculations, considerably, but doesn't change the fact that basically both models are ways of representing the a multivariate normal distribution in a different parameterization. One is the multtivariate normal distribution itself, and the other expresses it as a distribution conditional on the first observation.

Although the models are algebraically equivalent, they are not statistically equivalent when implemented as regression commands. In Model 1, the time = 0 outcome is treated as a random variable, whereas in Model 2, it is treated as an observed constant. This has implications for things like degrees of freedom used in tests. In addition, in some contexts, the pre-intervention value of the outcome is obtained from historical records, whereas the post-intervention outcome is measured according to some protocol in the course of the study. In that situation, the variance of the error term for time=0 outcomes may be different from the error variance for the subsequent measurement(s). Consequently the assumption that there is a common variance V_e will not be correct. So in these situations, there is a good case for not using Model 1.

However, when the assumption of a homogeneous variance V_e, and the other assumptions of the model, are reasonable, Model 1 has the advantage that the effect of the predictor is estimated in pure form, and is not "contaminated" by regression to the mean. To the extent that the regression to the mean is of interest itself, it is separately estimated in Model 1 as the ICC.

All of that said, it is important to correctly describe what the various terms in these models mean. In Model 1, the coefficient of predictor is the expected mean difference of the outcome in the two arms at time zero. The coefficient of predictor#time is the difference-in-differences estimator of the effect of being in the study arm indexed by predictor = 1. In Model 2, the coefficient of predictor is the effect of being in the study arm indexed by predictor = 1, attenuated by 1 minus the intraclass correlation coefficient.

One more comment: if there are only two time periods, there is no point in using a random slope on time. The predictor#t#time part of the model will pin down the two time points, so that the slope of the line between is not variable. The random slope variance will be zero. You may come out with some random slope variance like 10^-15 or the like, or you may find that the model simply won't converge to a variance component that close to zero.
Comment
Karen Gould

Join Date: Aug 2016

Posts: 12
#4

10 Sep 2016, 23:40

Thanks to you both. These explanations are very helpful and I'm slowly working through them (obviously, I don't have a mathematics background so it takes me a while). I think in my case, my preference is to use Model 1. Firstly, I find it simpler to interpret the predictor parameters. Secondly, if I use Model 1, I'm able to plot the estimated marginal means for my categorical predictor variable over time, visualize the curvilinear trajectory of change and do pairwise comparisons between different levels of my categorical predictor and different timepoints (as described in my previous post). I am not sure how/if this could be done with Model 2 (I have several predictors in my model so I imagine the calculations to convert values from Model 2 to equivalents in Model 1 would be quite complex).

On the other hand, I am just beginning to work through testing my model assumptions which will need to be met for me to use Model 1. A couple of questions:
1/ How would I determine whether the assumption of homogenous variance V_e is correct?
2/ Are you aware of any guides for testing assumptions of mixed models using stata? At present I'm referring to Lesa Hoffman's text Longitudinal Analysis, which provides very helpful Stata syntax examples for most sections of the book, but unfortunately not the appendix on assumption testing.

P.S. I do have 4 time periods, so the random slope does appear to be worth including.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30191
#5

11 Sep 2016, 10:03

The area of testing the assumptions of these models is not, as far as I know, very well developed. In terms of the specific issue of homogeneity of residual variance, one approach is, after estimating the model parameters, use -predict- to get the residuals, and then scatter plot them against the fitted values, or against individual predictors. It's not really different from what one might do with a flat regression. I think that, as with single-level regression models, heteroscedasticity of residuals affects inference (i.e. affects whether the standard errors are right) but does not bias estimation of coefficients. Also, if there is heteroscedasticity of residuals associated with some variable in your data, you can deal with that with the -by()- suboption of the -residuals()- option to explicitly model residual variance separately separately by levels of that variable.

That said, none of this distinguishes Model 1 from Model 2. That is, if your residuals are heteroscedastic in Model 1, they will also be heteroscedastic in Model 2 unless the sole heterogeneity involved is time 0 vs later measures. But, again, in Model 1 you can overcome that with -residuals( , by(time0)), where time0 is an indicator variables for baseline observations.
Comment

Announcement

Predictors in longitudinal mixed models - two ways to specify model

Comment

Comment

Comment

Comment