command for multilevel mixed effect model?

marco toffoli

Join Date: Jan 2017

Posts: 4
#1

command for multilevel mixed effect model?

17 Jan 2017, 05:07

I have a dataset of 140 patients equally divided into 3 groups. The dependent variable is "moca" and can take integers between 0 and 30. It is a longitudinal study with a total of 4 time-points (variable time is "timepoint"). The group defining variable is "status". The independent variables are "age at recruitment", "education", "sex" and "timepoint". I am interested in differences in the behaviour of moca across time in the 3 groups. The variable defining different patients is "recordID"

This is the command that I used:

mixed moca i.status i.sex education ageatrecruitment timepoint || RecordID: timepoint

So my questions are:

1- does the command I am using make sense?

2- with this model the lines I get for each group are parallel (i.e. have the same slope and differ only for the intercept). I suspect they might have different slopes though, how could I test for that? My guess would be to run the same command separately for each group (using the option by), but then how could I compare them with a test (the suest command doesn't work with the mixed command).

3- my dependent variable is limited to integers between 0 and 30 and it might be that my regression line tends to an asymptote at 30, how could I test for that/implement this in my model?

I apologise if my question is unclear/too long.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

17 Jan 2017, 05:28

Marco:
welcome to the list.
As per your code, you seem to have already imposed a random slope, too.
You may want to test whether the random slope is statistically significant or not:

Code:

mixed moca i.status i.sex education ageatrecruitment timepoint || RecordID: estimates store randint mixed moca i.status i.sex education ageatrecruitment timepoint || RecordID: timepoint estimates store randslope lrtest randint randslope

For more on -mixed- I would point you out to the related entry in Stata .pdf manual.
For the future, posting what you typed and what Stata gave you back, too (as per FAQ), dramtically increases your chances of getting positive replies.

Kind regards,
Carlo
(Stata 19.0)
Comment

marco toffoli

Join Date: Jan 2017
Posts: 4

17 Jan 2017, 07:26

Dear Carlo,

thank you very much for your answer and apologise for not following the forum etiquette.

my dataset looks like this:

Code:

    +--------------------------------------------------------------------+
     | RecordID   status      sex   timepo~t   moca   educat~n   ageatr~t |
     |--------------------------------------------------------------------|
  1. |        5       HC     Male          0     28         18         59 |
  2. |        5       HC     Male          1     29         18         59 |
  3. |        5       HC     Male          2     28         18         59 |
  4. |        5       HC     Male          3     28         18         59 |
  5. |        8       HC   Female          0     28         12         72 |
     |--------------------------------------------------------------------|
  6. |        8       HC   Female          1     29         12         72 |
  7. |        8       HC   Female          2     30         12         72 |
  8. |        8       HC   Female          3     30         12         72 |

this is my output (data are partial, don't mind the low p values)

Code:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log likelihood = -469.66663  
Iteration 1:   log likelihood = -469.40651  
Iteration 2:   log likelihood = -469.40476  
Iteration 3:   log likelihood = -469.40476  

Computing standard errors:

Mixed-effects ML regression                     Number of obs      =       222
Group variable: RecordID                        Number of groups   =        62

                                                Obs per group: min =         1
                                                               avg =       3.6
                                                               max =         4


                                                Wald chi2(6)       =     29.00
Log likelihood = -469.40476                     Prob > chi2        =    0.0001

-------------------------------------------------------------------------------
         moca |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
       status |
         GBA  |  -.9165106   .6539593    -1.40   0.161    -2.198247    .3652261
          GD  |  -1.110333   .5970305    -1.86   0.063    -2.280492     .059825
              |
          sex |
        Male  |   .0562517   .4587192     0.12   0.902    -.8428214    .9553248
    education |   .0142927   .0706304     0.20   0.840    -.1241403    .1527258
ageatrecrui~t |  -.0678237   .0245697    -2.76   0.006    -.1159794   -.0196681
    timepoint |   .3954907   .1033159     3.83   0.000     .1929953    .5979861
        _cons |   30.81542    2.27093    13.57   0.000     26.36447    35.26636
-------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
RecordID: Independent        |
               var(timepo~t) |   .0606655   .0955748      .0027665    1.330323
                  var(_cons) |   2.229686   .5688381      1.352337    3.676229
-----------------------------+------------------------------------------------
               var(Residual) |    2.64275   .3377034      2.057243    3.394898
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(2) =    51.09   Prob > chi2 = 0.0000

Originally posted by Carlo Lazzaro View Post

you seem to have already imposed a random slope, too.
You may want to test whether the random slope is statistically significant or not:

I have imposed a random slope on the single participant level (defined by "recordID"), while I am interested in finding differences in the group level (defined by the variable "status") which is in the fixed part of my model.

Last edited by marco toffoli; 17 Jan 2017, 07:41.

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

17 Jan 2017, 08:42

Originally posted by marco toffoli View Post

...

I have imposed a random slope on the single participant level (defined by "recordID"), while I am interested in finding differences in the group level (defined by the variable "status") which is in the fixed part of my model.

Actually, what you've imposed a random slope on is the effect of time (i.e. timepoint). The fixed effect coefficient for time is the overall average unit change in MOCA per unit time. You can do the likelihood ratio test that Carlo suggested for whether or not adding the random intercept is justified. Looking at Stata's estimate of the variance of the random slope, the 95% CI doesn't cross zero, so the LR test would likely reject, i.e. the variance of the random slopes is not zero, so it may be justified to add the random slope. But, your random slopes explain very little of the total model variance, so you may be able to go without them. Anyway, this would accomplish what you were setting out to do in #2.

You might need to explain your point number 3 a bit more. You mentioned that MOCA, the DV, is a scale from 0 to 30. Is it the Montreal Cognitive Assessment? When you say "asymptote" at 30 points, it sounds like you're worried about there being a ceiling effect at 30, e.g. it's possible for people to have cognitive performance scores that exceed 30 points, but the test can't measure above 30. And you'd like to know how to account for that in your analysis. How many percent of your sample are at the ceiling?

Fundamentally, if too many people are at the ceiling, then perhaps MOCA wasn't the right instrument to measure cognitive performance (or whatever it is) in your sample. Not sure if you can test for that in the linear mixed effects model, specifically. If some but not too many people weren't at the ceiling at their baseline, the xttobit command, which implements a censored regression model for panel data, might be useful (either as a sensitivity analysis or as your main model) if you're willing to make the assumption that I laid out about ceiling effects, but it appears that it can't accommodate a random slope. I am less familiar with Tobit models, as they're more used in econometrics, but my impression is that the interpretation of the fixed effect coefficients is a bit more complicated, and you may be better off displaying the marginal effects (margins command).

Last edited by Weiwen Ng; 17 Jan 2017, 08:50.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

17 Jan 2017, 08:48

Marco:
you may want to try:

Code:

mixed moca i.status i.sex education ageatrecruitment timepoint || RecordID: R.status

PS: Crossed in the cyberspace with Weiwen's reply, who highlights interesting facets related to Marco's research goal.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#6

17 Jan 2017, 08:51

I am interested in differences in the behaviour of moca across time in the 3 groups.

It sounds like what you need here is not so much a random slope (which is OK if you also want that, but won't accomplish this purpose), but a term for interaction between group and time.

Code:

mixed moca i.status##c.timepoint i.sex education ageatrecruitment || RecordID: // OR IF YOU ALSO WANT THE RANDOM SLOPE YOU CAN KEEP IT mixed moca i.status##c.timepoint i.sex education ageatrecruitment || RecordID: timepoint // AND TO SEE ESTIMATED ADJUSTED MEAN SLOPES IN EACH GROUP YOU CAN FOLLOW THAT WITH margins status, dydx(timepoint)
1 like
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

17 Jan 2017, 13:34

Originally posted by Weiwen Ng View Post

...

Fundamentally, if too many people are at the ceiling, then perhaps MOCA wasn't the right instrument to measure cognitive performance (or whatever it is) in your sample. Not sure if you can test for that in the linear mixed effects model, specifically. If some but not too many people weren't at the ceiling at their baseline, the xttobit command, which implements a censored regression model for panel data, might be useful (either as a sensitivity analysis or as your main model) if you're willing to make the assumption that I laid out about ceiling effects, but it appears that it can't accommodate a random slope. I am less familiar with Tobit models, as they're more used in econometrics, but my impression is that the interpretation of the fixed effect coefficients is a bit more complicated, and you may be better off displaying the marginal effects (margins command).

I realize I need to build a bit off my previous comment.

If MOCA is cognition, then presumably you are interested in studying cognitive decline. And presumably, your respondents' cognitive ability is going to be flat or declining over the course of the study. Unless many of them were at the ceiling at their last observation, or they had many observations throughout the study where they were at the maximum score, then perhaps you don't need to worry about a ceiling effect. What you would want to worry about is a floor effect, if there was one.

I re-read your initial post. It does sound like you are looking for slopes that differ across groups, and hence Clyde's coding will accomplish what you described.

If you want to go and fit a Tobit model, to the data anyway, then this code should accomplish it:

xtset RecordID timepoint
xttobit moca i.status##c.timepoint i.sex education ageatrecruitment, ul(30)

To be clear, the option ul(30) specifies that the scale's upper limit is 30. You could include a lower limit, e.g. ll(0), if you thought it was justified as well.

The XT series of commands will fit a model with (as I understand it) a random intercept by the participant ID. In xtset, you declare the panel ID (i.e. the participant ID), then the unit of time. The XT commands don't appear to be able to accommodate random slopes, but if your real data are like what you showed us, it may not be a huge impediment. If the variance of the random slopes is pretty low, then leaving the random slope out won't improve the precision of the main effect for time by a lot.

I've been doing a bit of reading, and it appears that GSEM (the generalized structual equation modeling command) may actually enable you to fit a censored regression model that includes random intercept AND slope. Again, I'm not convinced that you actually need a censored regression model, but I'm including some info here just in case. The first link is a description of Stata's commands dealing with censored outcomes, and it actually has an example of a Tobit random intercept model. The second and third links show how to fit Tobit models and random effects models in GSEM.

http://www.stata.com/features/overvi...ored-outcomes/
http://www.stata.com/manuals14/semex...#semexample43g
http://www.stata.com/manuals14/semex...#semexample38g

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
marco toffoli

Join Date: Jan 2017

Posts: 4
#8

18 Jan 2017, 04:47

Dear Claudio, Weiwen and Clyde,

thank you for your answers, they helped A LOT!
I think the model that best fits my needs is the one suggested by Clyde (I deem appropriate to keep the random slope):

Code:

mixed moca i.status##c.timepoint i.sex education ageatrecruitment || RecordID: timepoint

I would like to add a thought to what Weiwen said:

Originally posted by Weiwen Ng View Post

When you say "asymptote" at 30 points, it sounds like you're worried about there being a ceiling effect at 30, e.g. it's possible for people to have cognitive performance scores that exceed 30 points, but the test can't measure above 30. And you'd like to know how to account for that in your analysis. How many percent of your sample are at the ceiling?

Moca is indeed the Montreal Cognititive Assessment.
The comments about ceiling and floor effects are usefull and I tried to implement them in the model, but since xttobit would not allow me to use the mixed command I decided to abandon it (and anyway very few participants scored 30 at anytimepoint anyway). Moreover, when I talked about asymptotes what I was trying to say is that I am worried that as the moca score approaches 30, it is less likely to improve (or more likely to get worse). In other words, a patient that scored 15 at baseline will be more likely to do better at successive timepoints than a patient that scored 28 at baseline, simply because he has more space for improvement. The solution I came up with overnight is to add to my model a new variant: moca score at baseline. What do you think?

Last edited by marco toffoli; 18 Jan 2017, 05:18.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#9

18 Jan 2017, 09:56

The solution I came up with overnight is to add to my model a new variant: moca score at baseline. What do you think?

If you do this, be sure to exclude the baseline (which I imagine is timepoint = 0) observations from the analysis. It is an ill-formed model to have the baseline value of the outcome included as a covariate and also include the baseline observation. Even following this precaution, it's not clear to me that including the baseline value as a covariate solves this particular problem.

The perfect solution would be a random-slopes tobit model, but I'm not aware of any software to do that in Stata or any other package. So unless you want to write a program that does that (which I imagine is a major undertaking), something has to give. You said in #1 that your main goal is to contrast the time-trajectories of the scores in three groups. So for your purposes, unless there are only a negligible number of scores at or near 30, I think that the censoring of the MOCA score is a more serious problem than the presence of individual-level variation in the slope. I think I would follow Weiwen's advice to use -xttobit- and just forget about modeling the random slopes. Failing to model the random slopes may endow your results with more apparent precision than is warranted, but failing to account for the censoring will likely introduce very serious bias (unless there are just a handful of censored or near-censored observations).
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#10

18 Jan 2017, 10:47

Originally posted by marco toffoli View Post

Moca is indeed the Montreal Cognititive Assessment.
The comments about ceiling and floor effects are usefull and I tried to implement them in the model, but since xttobit would not allow me to use the mixed command I decided to abandon it (and anyway very few participants scored 30 at anytimepoint anyway). Moreover, when I talked about asymptotes what I was trying to say is that I am worried that as the moca score approaches 30, it is less likely to improve (or more likely to get worse). In other words, a patient that scored 15 at baseline will be more likely to do better at successive timepoints than a patient that scored 28 at baseline, simply because he has more space for improvement. The solution I came up with overnight is to add to my model a new variant: moca score at baseline. What do you think?

Marco, glad the discussion was helpful! Here are some thoughts on the paragraph you just added.

1) If few participants scored 30 or 0 at any time point, then I would not worry about ceiling effects, and so I would just skip the censored regression.

2) If you want to incorporate the baseline score into the statistical model, then I think it would make sense to delete the baseline observation from your data, which I'm assuming are in long format. Not certain if it's absolutely necessary, but it would make sense to me.

You may think that the effect of the treatment or the effect of status varies by baseline MOCA score. If you do, I think the way to find out would be to fit an additional interaction term after the main model. Would appreciate if others could give opinions here, but what I was taught was that you want to fit the main model, then the interaction model.

The syntax could look something like:

Code:

*Generate baseline MOCA score for all observations sort RecordID timepoint by RecordID: egen baseline_moca = moca[1] Code: mixed moca i.status##c.timepoint i.sex education ageatrecruitment baseline_moca || RecordID: timepoint *If you're assuming heterogeneous treatment effects by baseline MOCA, this should accomplish it: mixed moca i.status##c.timepoint##c.baseline_moca i.sex education ageatrecruitment baseline_moca || RecordID: timepoint *or mixed moca i.status##c.baseline_moca c.timepoint i.sex education ageatrecruitment baseline_moca || RecordID: timepoint

A negative slope for time * baseline MOCA should mean that the higher the baseline MOCA score, the lower the treatment effect. I think that this is what you are trying to find out.

3) If I understood you right, it sounds a bit like you think the treatment effect might be heterogeneous by baseline MOCA score. If this is right, then I think that yes, adding baseline MOCA score as a variable in the model and interacting it with the treatment makes sense.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#11

18 Jan 2017, 11:17

I think Weiwen's advice in #10 is excellent. Just including baseline MOCA as a covariate won't solve your problem, but also including a baseline MOCA # group # time interaction might do the trick. It is at least worth a try. My only reservation about this approach is that it models the inhibiting effect of a high baseline MOCA on subsequent increase in MOCA as being linear in the baseline MOCA. It isn't obvious whether that's a good model for ceiling effect or not in the general case. But if it produces a model that fits your data well, I would go with it.

So I would go with Weiwen's second -mixed- command. But do remember to first drop the baseline observations:

Code:

by RecordID timepoint, sort: drop if _n == 1
Comment
marco toffoli

Join Date: Jan 2017

Posts: 4
#12

19 Jan 2017, 09:35

Even if I knew this forum has a good reputation, the feedback I am having goes beyond my expectations. Thank you very much Clyde and Weiwen.

The 2 solutions we came up with are:
(after dropping MOCA at baseline)

Code:

mixed moca i.status##c.timepoint##c.baseline_moca i.sex education ageatrecruitment baseline_moca || RecordID: timepoint

Pros: it keeps the mixed effect model

Cons: it tries to deal with the ceiling effect by adjusting for status*timepoint*baseline moca, which is not obviously correct.

Code:

xtset RecordID timepoint xttobit moca i.status##c.timepoint i.sex education ageatrecruitment, ul(30)

Pros: it is the proper way of dealing with the ceiling effect.

Cons: it only models a random intercept and slope, while I would be more interested in a fixed effect of status (which could be achieved with the mixed command).

To sum things up, the perfect option does not exist, as it would be to use a mixed effect model with ceiling effect. I will try to use both options listed above and meditate on the results to figure out which better fits my model. For anybody interested in the discussion I will post the final results when the study ends (it will be in 45 days).

Also, I am copying below the STATA manual description for the xttobit result, which I find relevant to the discussion over ceiling effects and mixed effect models.

xttobit fits a random-effects tobit models. There is no command for a parametric conditional fixed-effects model, as there does not exist a sufficient
statistic allowing the fixed effects to be conditioned out of the likelihood. Honoré (1992) has developed a semiparametric estimator for fixed-effect
tobit models. Unconditional fixed-effects tobit models may be fit with the tobit command with indicator variables for the panels; the indicators can
be created with the factor-variable syntax described in [U] 11.4.3 Factor variables. However, unconditional fixed-effects estimates are biased.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#13

19 Jan 2017, 16:14

Originally posted by marco toffoli View Post

The 2 solutions we came up with are:
(after dropping MOCA at baseline)

Code:

mixed moca i.status##c.timepoint##c.baseline_moca i.sex education ageatrecruitment baseline_moca || RecordID: timepoint

Pros: it keeps the mixed effect model

Cons: it tries to deal with the ceiling effect by adjusting for status*timepoint*baseline moca, which is not obviously correct.

Code:

xtset RecordID timepoint xttobit moca i.status##c.timepoint i.sex education ageatrecruitment, ul(30)

Pros: it is the proper way of dealing with the ceiling effect.

Cons: it only models a random intercept and slope, while I would be more interested in a fixed effect of status (which could be achieved with the mixed command).

Marco, I just wanted to clarify a few things:

1) The baseline_moca at the end of your model in number 1 is redundant. Don't forget to delete it, else the model won't run. When you put two ##s consecutively, you already tell Stata to fit a model with a main effect and an interaction effect.

2) The Tobit model does not incorporate a random slope. It does effectively have a random intercept alone, I believe; in this case, a panel is an individual participant. When they say "random effects Tobit models", I am pretty sure that this means there is a random intercept for each person (i.e. each panel).

When they said "fixed effect Tobit model", they meant (as per my understanding) a Tobit model with a fixed effect by panel. For example,

Code:

tobit moca i.status##c.timepoint i.RecordID

Tobit models appear to be much more widely used in econometrics, and my training is in biostatistics. Hence, if I have misinterpreted the terminology, I hope someone will correct it.

3) You already mentioned that you have few observations at the ceiling at any time point. Hence, I am pretty sure that you will be fine with the linear mixed effect model. It has the limitation that Clyde observed in #11, but that is something you may have to live with. In particular, aside from fitting that interaction model and considering a censored regression model, I have no idea how else to work around the ceiling effect problem other than choosing a cognition measure without a ceiling effect and redoing the study. Obviously, that's not practical!

It might be helpful if you mention in your paper what percent of observations have a MOCA score of 30, and/or how many people had scores of 30 at any point. Or, at least have that information ready for reviewers.

I think I observed this earlier, but if you continue to have concerns about a ceiling effect and you also observe that the addition of the random slope accounts for relatively little variance in the outcome, you could then go and do the xttobit model as a sensitivity analysis. If the random slope accounts for a lot of the variance in the outcome, then this won't work as well.

Carlo, in #2, mentioned that you can do a likelihood ratio test for a model with random intercept alone vs random intercept + slope for time. This will show whether or not the model with a random slope fits better than the model with just the random intercept. Could be useful!

4) When you say you're interested in the fixed effect of status, I am pretty sure the xttobit command already provides that. Also, in a mixed model, anything to the left of the first || is a fixed effect.

5) I apologize if I am being pedantic here, but I want to clarify: I think you mean that the perfect model would be a Tobit model with random intercept and slope. And yes, that does not appear to exist as a native Stata command.

Glad to hear we could be helpful here! I would be very interested to hear the results.

Last edited by Weiwen Ng; 19 Jan 2017, 16:18.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement