How to include interaction term in longitudinal mixed logistic regression model

Robert Azzopardi

Join Date: Jun 2023

Posts: 17
#1

How to include interaction term in longitudinal mixed logistic regression model

20 Nov 2023, 00:23

Hi all,

I'm trying to create a mixed logistic regression model to look at the effect of having a disease on an outcome occuring over time.
Time in my data is represented by two visits - visit 1 and visit 2. Disease variable is either 0 ( no disease) or 1 (disease present).
Outcome is either 0 (hasnt occured yet) or 1 (has occured).
Initially I thought creating an interaction term as below would be the best method of finding the coefficient directly relating to this.

Code:

melogit outcome i.visit##i.disease || study_id: , or

Code:

lincom 1.disease + 1.visit#1.disease

The more I think about it, I believe that the lincom could be incorrect for what I want as it is simply looking at the effect of having disease in all those at visit 1 rather then looking at the effect of having disease when looking at changes in outcome from visit 0 to visit 1??

I'm wondering if instead I should be looking at:

Code:

lincom 1.visit

-> the effect of increase in visit by 1 for those without the disease, compared to:

Code:

lincom 1.visit+ 1.disease#1.visit

the effect of increase in visit by 1 for those with the disease

Could I have some advice on what is correct?

Any help is much appreciated!

Last edited by Robert Azzopardi; 20 Nov 2023, 00:28.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

20 Nov 2023, 10:56

Your design does not make sense to me. If you want to study the effect of a disease on the development of a certain outcome, you would start with only people who have outcome = 0 at visit 1, some of whom have the disease and some of whom do not. You would then assess whether the outcome developed by visit 2 in all of these people, and that would enable you to calculate the outcome incidence in the diseased and non-diseased groups. You could compare that using a simple cross-tabulation for a crude analysis, and you could adjust for confounders by a simple -logistic outcome i.disease confounders- analysis.

With the design you describe there are some serious difficulties. For example, among the people who have outcome = 1 at visit 1, you do not know when that outcome developed, and you cannot say whether that preceded or followed the onset of the disease among those people who have disease = 1 at visit 1. This group of people is not informative about the association between disease and outcome; in fact, the statistics from that group might be considered misinformation.
Comment
Robert Azzopardi

Join Date: Jun 2023

Posts: 17
#3

20 Nov 2023, 23:42

Sorry Clyde I think I've worded it poorly, in reality it is not an outcome but simply something that can develop. We want to see if the diseased group develops it at higher rates then the non-diseased group and whether an association exists.

An example of it is:
We have two groups of patients, one group with anyklosing spondilitis and one without. We want to see whether poor mobility (0 or 1) develops/occurs at higher rates in the ankylosing spondilitis group.
We check at baseline then recheck at 4 years time for instance.
A minority of people have poor mobility in either group at baseline.
We want to account for confounders such as low muscle mass at baseline etc.

In this context would the regression model make more sense or still no?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#4

21 Nov 2023, 09:25

The details allay some, but not all, of my concerns about the design. The disease in question, is revealed to be one that is not going to go away between visit 1 and visit 2, and its incidence is low enough that there will be little or no crossover between the disease and non-disease groups. However, the problem remains that in the group who have both ankylosing spondilitis and poor mobility at visit 1, you cannot know whether the poor mobility preceded or followed the onset of ankylosing spondylitis, so that this subset of the patients are, at best uninformative, and at worst misleading.

If I were designing a study to answer this question, I would begin by excluding anybody with poor mobility at visit 1. Then of those who remain, I would divide them into the groups with and without ankylosing spondilitis, and follow them to assess the development of poor mobility at visit 2, and compare the incidence rates of poor mobility in the two groups. The analysis would be a simple logistic regression: -logistic poor_mobility i.ankylosing_spondilitis confounders-, assuming that the interval between visits is reasonably close to the four-year target for all participants. If the interval between visits varies to a considerable extent, then one has to do some analysis that takes that into account, and a -poisson- regression where the inter-visit interval is used as the -exposure()- variable comes to mind.
2 likes
Comment
Robert Azzopardi

Join Date: Jun 2023

Posts: 17
#5

21 Nov 2023, 10:38

Thanks Clyde that is very helpful. A follow up question if i may:

I'm struggling to understand in what situation you would use a mixed effect model instead when investigating longitudinal data?
From the examples given by StataCorp for instance: https://www.youtube.com/watch?v=rUWT_EWV6QI
They use it for data in which measurements were taken at year marks with regions used as a random effects component. Another video regarding this topic uses the subjects as a random effects component.
Another outcome I look at is continuous and taken at visit 1 and visit 2. In this instance would using a mixed effect linear model be suitable with individuals as the random effect?
Or alternatively, does it run into the same issues you've highlighted and instead making a variable 'delta' = visit 2 measurement - visit 1 measurement and running a simple linear model be more appropriate?

Mixed models and longitudinal data analysis are new to me and am still trying to grasp the concepts.

Last edited by Robert Azzopardi; 21 Nov 2023, 10:41.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#6

21 Nov 2023, 13:29

Another outcome I look at is continuous and taken at visit 1 and visit 2. In this instance would using a mixed effect linear model be suitable with individuals as the random effect?

So, this is a very different situation from the original question, because we are not investigating the incidence of an event. Here we are trying to understand the evolution of a continuous variable over time and how that is modified by the presence of ankylosing spondilitis. So this is a situation where the use of a mixed model would make sense.

I'd like to make a few comments about the situation where a continuous outcome is measured twice, once at baseline and again at followup. There are three popular approaches to analyzing such variables. 1) Calculate the difference for each participant and the analyze that with a simple linear model, 2) use the second measure as the outcome variable in a simple linear regression with the baseline value as one of the covariates, and 3) use a multi-level model. [Approach 2) is often referred to as analysis of covariance.]

Despite the appeal of the clear simplicity of approach 1, it is very often a terrible idea. It depends on aspects of the variable's distribution. For details, see https://www.fharrell.com/post/errmed/#change, where Frank Harrell gives a long list of conditions which must all be met for this approach to work--in real life, it is hard to find variables that meet all of these conditions.

It is a little known fact that approaches 2) and 3) are actually algebraic transforms of each other, and the results from either can be calculated from the results of the other if you know the formulas. It is, unfortunately, not widely understood, however, that the results from 3) are much more directly related to the actual effect of the regressor(s) on the evolution of the outcome, so 3) is generally preferable. But there is a big caveat: they are not statistically equivalent. The difference between them statistically lies in the fact that approach 2) treats the baseline measurement as a known constant and ignores any sources of variance (especially measurement error) in it, whereas approach 3) treats the baseline measurement as being subject to variation just the same as the follow-up measurement. Since, especially in clinical research, we are usually working with measurements that have substantial error, approach 2) is typically not very suitable. One thing to be careful off in approach 3) is that sometimes the method by which the baseline measurement is ascertained is different from the way it is ascertained at follow-up. For example, the baseline measurement is sometimes extracted from clinical records, whereas the follow-up measurement is obtained by the investigator using a systematic measurement protocol. In a situation like that, it is important to model this difference by estimating different residual error variance at the two time periods. (The -mixed- command can do this via the -residuals(, by(time))- option.)

Long story short, I prefer approach 3), the mixed model, in almost every situation. The skeleton of the command is:

Code:

mixed outcome_measure i.time##i.disease covariates || participant_id:

The coefficient of the time#disease interaction term is your estimate of the effect of the disease's modification of the evolution of the outcome measure between the two time periods. You need look no farther than that for the key result. If you also want the expected values of the outcome measure in each group at each time, you can get that by running -margins time#disease- after the regression.

[quote]Mixed models and longitudinal data analysis are new to me and am still trying to grasp the concepts.[/code]
Be patient. This is complicated material with a steep learning curve.
2 likes
Comment

Announcement

How to include interaction term in longitudinal mixed logistic regression model

Comment

Comment

Comment

Comment

Comment