Linear mixed effect model to predict mean change of disease severity score per year using xtmixed command

Masnoon Saiyed

Join Date: Jul 2018

Posts: 5
#1

Linear mixed effect model to predict mean change of disease severity score per year using xtmixed command

18 Jul 2018, 21:54

Dear Friends,

I am new to this forum. I am very happy to see so many people helping each other.

I am currently learning how to do linear mixed effect modelling using the xtmixed command in Stata. I have data in long fomat as shown below.

I want to predict overall mean change of score (mean slope) with time point(follow-up) as continuous with random intercept and slope coefficient.

Now, I have a command as

xtmixed change_score c.tp, II ID: tp, covariance(unstructured) mle

Please correct my command, if it is incorrect. many thanks for your help!

Best Regards,
Masnoon.

Last edited by Masnoon Saiyed; 18 Jul 2018, 21:56.
Tags: multilevel modelling, xtmixed
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

19 Jul 2018, 11:12

The code is one correct way of doing this. It is not necessary to specify the -mle- option, as in current Stata, that is the default. Also, the current name of the command is -mixed-. The older name -xtmixed- still works, but in the future it might not. So better to get into the habit of using current nomenclature. (Actually, seeing you do these two things makes me wonder if you are perhaps using an older version of Stata. If you are, you should say so in your post so that you don't get advice that is inapplicable to your situation. Or perhaps you are learning Stata from outdated instructional materials?)

Also, with only a single random intercept and slope, there is no need to specify the -covariance(unstructured)- option, or even specify a -covariance()- option of any kind: there is only one covariance parameter being estimated, that between the intercept and the slope, so all of the different possible covariance options are equivalent. This option is only needed when there are more than two random effects in the model.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Masnoon Saiyed

Join Date: Jul 2018

Posts: 5
#3

19 Jul 2018, 19:26

Thank you so much for your answer. I will follow your inputs when I am posting in near future.

Best Regards,
Masnoon Saiyed.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#4

20 Jul 2018, 02:22

Originally posted by Clyde Schechter View Post

Also, with only a single random intercept and slope, there is no need to specify the -covariance(unstructured)- option, or even specify a -covariance()- option of any kind: there is only one covariance parameter being estimated, that between the intercept and the slope, so all of the different possible covariance options are equivalent. This option is only needed when there are more than two random effects in the model.

Clyde, could you elaborate on that? I don't quite follow what you're trying to say. Albeit, the common regression coefficients and variance estimates might turn out to be similar or even nearly identical, but to me the models are different, with different likelihoods. One fits a covariance between random slope and intercept and the other model doesn't.

Code:

version 15.1 clear * set seed `=strreverse("1454205")' quietly drawnorm intercept slope, double corr(1 -0.25 \ -0.25 1) n(250) generate int pid = _n quietly expand 4 bysort pid: generate double tim = _n - 2.5 generate double out = -1 + tim / 100 + intercept + slope * tim + rnormal() mixed out c.tim || pid: tim , covariance(unstructured) stddeviations nolrtest nolog estimates store Full mixed out c.tim || pid: tim, stddeviations nolrtest nolog lrtest Full exit
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

20 Jul 2018, 09:55

Thank you, Joseph, for pointing up my error.

What I should have said is that because there is only one covariance being estimated, it makes no difference whether one specifies -covariance()- as unstructured or exchangeable: they are the same in this situation. However, they do differ from independent (the default) or identity.

So what I said is wrong as applied to Masnoon's situation, because if he followed my advice, he would get no estimate at all of the covariance: the default independent structure of course implies covariance = 0, which does not need to be estimated, and which is different. It won't effect any of the coefficient estimates or the variance component estimates themselves, but it certainly does affect the covariance estimate.

Sorry for the error and any problems caused by it.

Added: What provoked my erroneous comment is that I often see people using -cov(uns)- on a routine basis. In one sense, this is quite justifiable because it does not rest on any particular assumptions about the covariance structure. But in situations where there are more than a handful of random effects, the number of parameters that must be estimated becomes large very rapidly. Sometimes it makes the estimation fail to converge altogether when there is not enough data to identify that many parameters. So I'm in the habit of encouraging people to avoid -cov(uns)- provided exchangeable (or in those models that support it, autoregressive) is at least somewhat plausible. (Exchangeable is often suitable in my line of work, where many of these studies involve repeated measures). Anyway, that's what I was thinking about. Unfortunately, it wasn't applicable to this situation, and on top of that, my actual statement went way beyond that and was palpably false. Egg on my face!

Last edited by Clyde Schechter; 20 Jul 2018, 10:01.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#6

20 Jul 2018, 17:34

Clyde, OK, makes sense now, thanks! I agree that there's often an ill-considered enthusiasm to overdo the residual structure. There's no egg on your face at all.
Comment
Dana Rose Garfin

Join Date: Apr 2014

Posts: 21
#7

15 Aug 2018, 12:11

Hi everyone! I am not sure whether to start a new post about this, but my issue pertains to selecting the correct covariance structure for my mixed model so I thought I would add it here. I am driving myself crazy trying to select the correct covariance structure for a simple mixed model:

I am only estimating one random effect (Subject) and the rest of the model is fixed effects. My current code is as follows:

xtmixed dv i.group time covariate1 covariate2 covariate3 covariate4 covariate5 covariate6 || Subject:, var reml

The default is independent covariance structure....is this the correct one to select? I was thinking of choosing unstructured or identity? Any advice would be appreciated.

Thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#8

15 Aug 2018, 12:26

The covariance structure determines what constraints (if any) are imposed on the covariance matrix of the random effects. When you have only a single random intercept, as here, the covariance matrix is just a 1x1 matrix containing the variance of the subject-level random effects. So constraints about the off-diagonal terms are irrelevant (there aren't any), and it is also irrelevant whether the on-diagonal terms are constrained to be equal (identity) or not (independent) because there is only one on-diagonal term. In short, in this situation, you will get the same results regardless of which covariance structure you choose.

The covariance structure matters when you have random effects at different levels, or both random effects and random slopes. But for a single random intercept, it makes no difference.
Comment
Dana Rose Garfin

Join Date: Apr 2014

Posts: 21
#9

15 Aug 2018, 12:38

Thanks for your response Clyde. As a note the estimates do change slightly depending on the covariance structure I select although the pattern remains identical and the estimates only change slightly. Any thoughts about this? Thanks again for your expertise.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#10

15 Aug 2018, 12:40

Can you show an example of this? That shouldn't be happening.
Comment
Dana Rose Garfin

Join Date: Apr 2014

Posts: 21
#11

15 Aug 2018, 12:53

See below for independent, unstructured, and identity, respectively.

Thank you!
.

. do "C:\Users\DANA\AppData\Local\Temp\STDf7c_000000.tm p"

.
. xtmixed mCD4 i.group##i.time c.D2_a_0 c.QOL c.AM7__0 c.Tot_MET c.ISscale c.FItotal_0 i.educ4 || Subj
> ect:, var reml

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log restricted-likelihood = -14404.237
Iteration 1: log restricted-likelihood = -14404.237

Computing standard errors:

Mixed-effects REML regression Number of obs = 2,400
Group variable: Subject Number of groups = 600

Obs per group:
min = 4
avg = 4.0
max = 4

Wald chi2(24) = 38216.35
Log restricted-likelihood = -14404.237 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
mCD4 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
G2 | -42.40045 26.27671 -1.61 0.107 -93.90186 9.100961
G3 | -13.87916 26.06027 -0.53 0.594 -64.95636 37.19804
G4 | -28.69841 27.1849 -1.06 0.291 -81.97983 24.58301
|
time |
2 | 343.9733 7.222571 47.62 0.000 329.8174 358.1293
3 | 441.1933 7.222571 61.09 0.000 427.0374 455.3493
4 | 529.62 7.222571 73.33 0.000 515.464 543.776
|
group#time |
G2#2 | 12.17333 10.21426 1.19 0.233 -7.846245 32.19291
G2#3 | 40.68 10.21426 3.98 0.000 20.66042 60.69958
G2#4 | 54.94 10.21426 5.38 0.000 34.92042 74.95958
G3#2 | 125.6867 10.21426 12.31 0.000 105.6671 145.7062
G3#3 | 137.7067 10.21426 13.48 0.000 117.6871 157.7262
G3#4 | 143.04 10.21426 14.00 0.000 123.0204 163.0596
G4#2 | 186.8467 10.21426 18.29 0.000 166.8271 206.8662
G4#3 | 238.4267 10.21426 23.34 0.000 218.4071 258.4462
G4#4 | 250.4867 10.21426 24.52 0.000 230.4671 270.5062
|
D2_a_0 | -2.674945 1.317103 -2.03 0.042 -5.256419 -.0934703
QOL | 178.5831 32.09347 5.56 0.000 115.6811 241.4852
AM7__0 | 12.16232 .6820185 17.83 0.000 10.82559 13.49905
Tot_MET | .0007499 .0058682 0.13 0.898 -.0107515 .0122514
ISscale | 82.68932 39.92619 2.07 0.038 4.435429 160.9432
FItotal_0 | 130.3591 26.19802 4.98 0.000 79.01192 181.7062
|
educ4 |
2 | -6.63243 25.14702 -0.26 0.792 -55.91968 42.65482
3 | -19.2449 23.74673 -0.81 0.418 -65.78764 27.29784
4 | -41.50954 26.96124 -1.54 0.124 -94.3526 11.33352
|
_cons | -351.3133 119.5254 -2.94 0.003 -585.5788 -117.0477
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Subject: Identity |
var(_cons) | 45188.25 2694.968 40203.23 50791.39
-----------------------------+------------------------------------------------
var(Residual) | 3912.415 130.8507 3664.178 4177.47
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 3720.57 Prob >= chibar2 = 0.0000

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 2,400 . -14404.24 27 28862.47 29018.62
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note.

. xtmixed mCD4 i.group##i.time c.D2_a_0 c.QOL c.AM7__0 c.Tot_MET c.ISscale c.FItotal_0 i.educ4 || Subj
> ect:, var noconst residuals(unstr, t(time)) reml

Obtaining starting values by EM:

Performing gradient-based optimization:

Iteration 0: log restricted-likelihood = -16264.522 (not concave)
Iteration 1: log restricted-likelihood = -14660.797 (not concave)
Iteration 2: log restricted-likelihood = -14173.505 (not concave)
Iteration 3: log restricted-likelihood = -13973.144
Iteration 4: log restricted-likelihood = -13847.854
Iteration 5: log restricted-likelihood = -13833.126
Iteration 6: log restricted-likelihood = -13831.963
Iteration 7: log restricted-likelihood = -13831.96
Iteration 8: log restricted-likelihood = -13831.96

Computing standard errors:

Mixed-effects REML regression Number of obs = 2,400
Group variable: Subject Number of groups = 600

Obs per group:
min = 4
avg = 4.0
max = 4

Wald chi2(24) = 21483.56
Log restricted-likelihood = -13831.96 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
mCD4 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
G2 | -41.9664 24.9817 -1.68 0.093 -90.92962 6.996828
G3 | -15.69655 24.77742 -0.63 0.526 -64.25939 32.86629
G4 | -28.69724 25.83899 -1.11 0.267 -79.34073 21.94625
|
time |
2 | 343.9733 8.937842 38.49 0.000 326.4555 361.4912
3 | 441.1933 9.810549 44.97 0.000 421.965 460.4217
4 | 529.62 9.528838 55.58 0.000 510.9438 548.2962
|
group#time |
G2#2 | 12.17333 12.64002 0.96 0.336 -12.60064 36.94731
G2#3 | 40.68 13.87421 2.93 0.003 13.48704 67.87296
G2#4 | 54.94 13.47581 4.08 0.000 28.52789 81.35211
G3#2 | 125.6867 12.64002 9.94 0.000 100.9127 150.4606
G3#3 | 137.7067 13.87421 9.93 0.000 110.5137 164.8996
G3#4 | 143.04 13.47581 10.61 0.000 116.6279 169.4521
G4#2 | 186.8467 12.64002 14.78 0.000 162.0727 211.6206
G4#3 | 238.4267 13.87421 17.18 0.000 211.2337 265.6196
G4#4 | 250.4867 13.47581 18.59 0.000 224.0746 276.8988
|
D2_a_0 | -2.111174 1.247662 -1.69 0.091 -4.556547 .3341985
QOL | 179.0694 30.40142 5.89 0.000 119.4837 238.6551
AM7__0 | 12.11864 .6460608 18.76 0.000 10.85238 13.3849
Tot_MET | .0031159 .0055588 0.56 0.575 -.0077792 .0140109
ISscale | 130.399 37.82118 3.45 0.001 56.27089 204.5272
FItotal_0 | 123.0381 24.81679 4.96 0.000 74.39807 171.6781
|
educ4 |
2 | -4.337897 23.8212 -0.18 0.856 -51.0266 42.3508
3 | -22.08464 22.49474 -0.98 0.326 -66.17352 22.00425
4 | -30.36296 25.53978 -1.19 0.234 -80.42 19.69409
|
_cons | -473.86 113.2337 -4.18 0.000 -695.794 -251.926
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Subject: (empty) |
-----------------------------+------------------------------------------------
Residual: Unstructured |
var(e1) | 44397.71 2604.854 39574.89 49808.25
var(e2) | 52324.55 3064.257 46650.58 58688.63
var(e3) | 50719.78 2969.408 45221.35 56886.77
var(e4) | 49312.99 2881.987 43975.9 55297.8
cov(e1,e2) | 42369.75 2658.601 37158.99 47580.51
cov(e1,e3) | 40340.23 2575.038 35293.25 45387.21
cov(e1,e4) | 40045.44 2545.357 35056.63 45034.25
cov(e2,e3) | 50665.83 2990.002 44805.54 56526.13
cov(e2,e4) | 49255.74 2925.168 43522.52 54988.97
cov(e3,e4) | 48981.03 2893.986 43308.92 54653.14
------------------------------------------------------------------------------
LR test vs. linear model: chi2(9) = 4865.12 Prob > chi2 = 0.0000

Note: The reported degrees of freedom assumes the null hypothesis is not on the boundary of the
parameter space. If this is not true, then the reported test is conservative.

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 2,400 . -13831.96 35 27733.92 27936.33
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note.

. xtmixed mCD4 i.group##i.time c.D2_a_0 c.QOL c.AM7__0 c.Tot_MET c.ISscale c.FItotal_0 i.educ4 || Subj
> ect:, covariance (identity) var

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -14475.47
Iteration 1: log likelihood = -14475.47

Computing standard errors:

Mixed-effects ML regression Number of obs = 2,400
Group variable: Subject Number of groups = 600

Obs per group:
min = 4
avg = 4.0
max = 4

Wald chi2(24) = 38479.64
Log likelihood = -14475.47 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
mCD4 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
group |
G2 | -42.40045 26.00178 -1.63 0.103 -93.363 8.562097
G3 | -13.87916 25.78779 -0.54 0.590 -64.4223 36.66398
G4 | -28.69841 26.89969 -1.07 0.286 -81.42084 24.02402
|
time |
2 | 343.9733 7.198456 47.78 0.000 329.8646 358.082
3 | 441.1933 7.198456 61.29 0.000 427.0846 455.302
4 | 529.62 7.198456 73.57 0.000 515.5113 543.7287
|
group#time |
G2#2 | 12.17333 10.18015 1.20 0.232 -7.779402 32.12607
G2#3 | 40.68 10.18015 4.00 0.000 20.72726 60.63274
G2#4 | 54.94 10.18015 5.40 0.000 34.98726 74.89274
G3#2 | 125.6867 10.18015 12.35 0.000 105.7339 145.6394
G3#3 | 137.7067 10.18015 13.53 0.000 117.7539 157.6594
G3#4 | 143.04 10.18015 14.05 0.000 123.0873 162.9927
G4#2 | 186.8467 10.18015 18.35 0.000 166.8939 206.7994
G4#3 | 238.4267 10.18015 23.42 0.000 218.4739 258.3794
G4#4 | 250.4867 10.18015 24.61 0.000 230.5339 270.4394
|
D2_a_0 | -2.674945 1.302756 -2.05 0.040 -5.2283 -.1215894
QOL | 178.5831 31.74388 5.63 0.000 116.3663 240.8
AM7__0 | 12.16232 .6745895 18.03 0.000 10.84015 13.48449
Tot_MET | .0007499 .0058043 0.13 0.897 -.0106262 .0121261
ISscale | 82.68932 39.49128 2.09 0.036 5.287823 160.0908
FItotal_0 | 130.3591 25.91265 5.03 0.000 79.57122 181.1469
|
educ4 |
2 | -6.63243 24.8731 -0.27 0.790 -55.38281 42.11795
3 | -19.2449 23.48807 -0.82 0.413 -65.28067 26.79086
4 | -41.50954 26.66756 -1.56 0.120 -93.777 10.75792
|
_cons | -351.3133 118.2247 -2.97 0.003 -583.0295 -119.597
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
Subject: Identity |
var(_cons) | 44194.5 2607.867 39367.68 49613.13
-----------------------------+------------------------------------------------
var(Residual) | 3886.333 129.5444 3640.547 4148.713
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 3733.50 Prob >= chibar2 = 0.0000

. estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
Model | Obs ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 2,400 . -14475.47 27 29004.94 29161.09
-----------------------------------------------------------------------------
Note: N=Obs used in calculating BIC; see [R] BIC note.

.
end of do-file

.

Last edited by Dana Rose Garfin; 15 Aug 2018, 13:16.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#12

15 Aug 2018, 13:20

Well, the model you are showing as an example of unstructured, is different. There you are estimating heteroscedastic residuals and specifying an unstructured matrix for the covariance of the residual variances at different times. That's not the same as specifying the -cov(unstructured)- option.

And the difference you are seeing between cov(identity) in your third results, and the default independent in your first results is not due to the way you specified the -cov()- option. Rather, in the first results you are estimating by restricted maximum likelihood, but in the third, by default, you are estimating by ordinary maximum likelihood.
Comment
Dana Rose Garfin

Join Date: Apr 2014

Posts: 21
#13

15 Aug 2018, 13:25

That makes sense. Clyde - do you have any thoughts about which would be the most appropriate method?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#14

15 Aug 2018, 13:52

I don't have a deep understanding of the difference between -reml- and -ml in this context. I know that the results are usually close enough that the difference between them is of no practical importance, at least in reasonably large samples, and I don't really know how one might choose between them. (In practice I have always just used -ml-, but I can't give you a good reason for that.)

Perhaps another Forum member with a better knowledge of this issue will respond.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#15

15 Aug 2018, 19:12

Dana, please in future posts, show commands and listings between [CODE] and [/CODE] delimiters, as requested in FAQ 12. The font you chose (Courier New) is quite faint and although it is monospace, many columns don't line up. .

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Linear mixed effect model to predict mean change of disease severity score per year using xtmixed command

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment