Mixed models and association between 2 variables over time

Laura Myles

Join Date: Jun 2018

Posts: 153
#1

Mixed models and association between 2 variables over time

09 Oct 2020, 09:52

Hi Listers,

I have data collected at 5 time points: baseline, 3 weeks, 6 weeks, 9 weeks, and 12 weeks. At each point, well-being scores (continuous measure) are being recorded as well as hospitalization status (hospitalized or not). I am interested in assessing whether well-being scores predict hospitalization at the next time point (e.g. well-being reported at baseline predicts hospitalization at 3 weeks).

I could run separate logistics regressions for hospitalization at weeks 3 to 12 but I would like to be able to run one model. I am wondering if I could use a mixed model to determine whether there is an association between well-being and hospitalization rates overall. I am not sure how to best set it up using xtlogit.

I reshaped the data (extract below) so that baseline well-being scores are coded as 3 weeks, scores at week 6 as week 3 so that they correspond to the hospitalization timepoint of interest.

I then set up the xlogit command as I included time as a covariate - is this model only applying random intercept or slope as well?

xtlogit hospital wellbeing i.time ,i(id)

I would appreciate any feedback on this approach and/or whether I should be looking at something different.

Thanks in advance!

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int id byte time double(wellbeing hospital) 1 0 2 1 1 3 2 1 1 6 6 1 1 9 3 1 2 0 7 1 2 3 3 1 2 6 3 . 2 9 . 1 3 0 7 1 3 3 7 1 3 6 5 0 3 9 . 0 4 0 5 . 4 3 . . 4 6 . . 4 9 . . 5 0 7 1 5 3 5 1 5 6 2 0 5 9 . . 6 0 4 1 6 3 7 0 6 6 . 1 6 9 5 1 7 0 3 1 7 3 4 0 7 6 . 0 7 9 . 0 end label values hospital FU_12mo_still_smoking_coded label def FU_12mo_still_smoking_coded 1 "yes", modify
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30075
#2

09 Oct 2020, 15:40

This is a reasonable approach.

I would not have re-organized the data the way you did. I would have left the baseline wellbeing scores in the time 0 observations, the 3 week well-being scores in the 3 week observations, etc., and gotten Stata to use the lagged values by taking advantage of the lag operator:

Code:

xtset id time xtlogit hospital L1.wellbeing i.time

See -help tsvarlist- for details on the use of the lag and other time-series operators.

Nevertheless, what you've done is OK. Just hope that if you come back to this data at a later time you remember that the time variable is, in effect, lying about the well-being variable.

With -xtlogit- you get only random intercepts, no random slopes. If you need random slopes, you have to use -melogit- instead.
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#3

11 Oct 2020, 10:55

Thanks Clyde,

I am now using your suggestion and specifying the lag. I have also recoded time to be 1,2,3 and 4 for timepoint 0, 3wk, 6wk, and 9wk.

I noticed that in the xtset command you specified both id and time - what's the advantage of specifying time as well?

Lastly, it there a way to determine whether random intercepts is sufficient or whether random slope should also be implemented in the model. I am not sure if there is a way to compare the model fit from xtlogit with that of melogit (or if this makes sense at all).

I hope you can help with these queries too!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30075
#4

11 Oct 2020, 11:07

I noticed that in the xtset command you specified both id and time - what's the advantage of specifying time as well?

The advantage of specifying time in -xtset- is that it makes it possible to use time-series operators like lag, lead, difference, seasonal difference. If you just -xtset panelvar- you can't use those. The other thing you get from specifying time is that you can use autoregressive structure in those commands that support it. Since you're specifically interested in using lagged variables, this would be convenient for you and would save you the trouble of creating the unorthodox data structure you have.

Lastly, it there a way to determine whether random intercepts is sufficient or whether random slope should also be implemented in the model. I am not sure if there is a way to compare the model fit from xtlogit with that of melogit (or if this makes sense at all).

You can't compare -xtlogit- to -melogit-. But what you can do is run your -xtlogit, re- model in -melogit- instead. Then you can also run it with random slopes and compare those two with the likelihood ratio test. I can't get -melogit- to run on your example data, probably because it's just too small. But here's an illustration of this approach using one of StataCorp's example data sets

Code:

clear* webuse bangladesh melogit c_use age i.urban || district: estimates store intercepts_only melogit c_use age i.urban || district: age lrtest . intercepts_only

When you run a two-level model with no random slopes in -melogit- you are estimating the exact same model that -xtlogit, re- estimates. The numerical method for estimating the parameters is different, but the model itself is exactly the same. (And the results come out the same either way, except sometimes for tiny rounding errors in far-off decimal places). So you can use -melogit- results as a proxy for -xtlogit, re- results in this way.
1 like
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#5

12 Oct 2020, 03:29

Thanks Clyde,

I used your melogit approach and found random slopes did not improve the model fit so I opted for xtlogit
xtset id time xtlogit hospital l1.wellbeing i.time //I set the lag at l1 as the data points are coded as 0(baseline),1 (wk3),2 (wk6),3 (wk9) and 4 (wk12). The OR = 0.088 with p = 0.008 shows that patients with higher well-being scores are less likely to be hospitalized at the follow-up session; this is overall at weeks 3-12. Is there any point showing the association at each time point separately using polychoric correlation? The Wald test shows that time is not significant testparm i.time, p = 0.51 - this is for the true time when using the lagged command, correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30075
#6

12 Oct 2020, 10:27

In terms of odds ratios, because your model does not include an interaction between time and (l1.)wellbeing, the odds ratio will be the same at each time. However, if you were to use the -margins- command to calculate the probability difference, those would show different results due to the non-linearity of the logistic model. If the purpose of your analysis is to provide probability estimates for a decision analysis, then you ought to pursue that additional approach. If you are simply trying to answer a theoretical question about the association between well being and hospitalization, then I think your current analysis and results are sufficient.

I don't understand what you have in mind with regard to polychoric correlation.

Yes, the Wald test neither knows nor cares what the meaning of the other variables in the model is. It only cares about the covariance matrix and coefficient estimates as numbers.
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#7

28 Oct 2020, 02:47

Hi Clyde, thanks again. The margins option is what I need!
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#8

11 Aug 2021, 03:09

Hi Clyde Schechter - I have a follow-up question for this topic.

If I wanted to adjust the model above for demographics characteristics, such as sex, age (continuous) and socioeconomic status, it seems from previous posts that I should include the interaction term between each variable and time as their effect may change over time.

My data was collected at 5 time points and so far I included time as a factor in the model, is it OK to expand my model as:

xtset id time xtlogit hospital L1.wellbeing i.time i.sex i.sex#i.time age age#i.time i.ses i.ses#i.time I have seen some resources where time is treated as continuous in this type of analysis but I am assuming that is more suited for longer time series. Can you advise?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30075
#9

11 Aug 2021, 13:46

it seems from previous posts that I should include the interaction term between each variable and time as their effect may change over time.

It depends on what you understand the real world data-generating process to be. If, in the real world, these effects change over time, then, yes, they should be interacted with the time variable to reflect this. But often effects like this do not vary over time--it really depends on what outcomes we are talking about. In that case, there is no reason to include the interactions. So you just have to get the best understanding you can of the real world dynamics and model accordingly. Consult the literature on the topic to see what has been found in the past.

You can simplify the code for your model somewhat:

Code:

xtset id time xtlogit hospital L1.wellbeing i.time##(i.sex c.age i.ses)

Note: age#i.time, as written in your proposed code, will wreck your model. In interaction terms, variables with no prefix are assumed to be discrete. But you also have age by itself--and in non-interaction terms variables with no prefix are assumed to be continuous. So you have told Stata to inconsistently treat your age variable as continuous and as discrete. The results will be garbage. If you use the code above, the c. prefix will prevent this problem.

Whether to treat time as a discrete or continuous variable, again, depends on your understanding of what happens to the outcome variable over time. If the log odds of hospital grows linearly with time, then it should be treated as continuous. Treating it as discrete instead treats time as a series of idiosyncratic shocks to hospital. You have to look at the real world dynamics and decide which is going on.
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#10

12 Aug 2021, 03:00

Thanks Clyde Schechter for another helpful reply! A follow-up from your comment: how can I check if the log odds of hospital grows linearly with time?

Last edited by Laura Myles; 12 Aug 2021, 03:22.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30075
#11

12 Aug 2021, 10:15

Well, before you do anything with your data, it's a good idea to think about the dynamics of how your variable changes over time. If it is subject to consistent appreciable directional trends, then it is likely that a continuous time representation is appropriate. If, on the other hand, there are large year-on-year haphazard fluctuations, then a discrete representation will be more appropriate. You may want to review the literature in your area and see what others have found, to guide your thinking.

Then, within your own data, you can do a plot like -lowess hospital time, logit- and see how it looks. This would be a direct plot of log odds hospital against time. If it's just a jumble, then either no representation of time or a discrete representation is most appropriate. If it looks a lot like a straight line, you are dealing with a continuous time trend. If it's more or less curvilinear, then you probably want continuous time, but might want to consider some kind of transformation of the time variable.
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#12

16 Aug 2021, 04:32

Thanks again Clyde Schechter
Comment

Announcement

Mixed models and association between 2 variables over time

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment