Predictors of cognitive decline : linear mixed models with repeated data, interactions, and backward stepwise method

laure rouch

Join Date: May 2016

Posts: 25
#1

Predictors of cognitive decline : linear mixed models with repeated data, interactions, and backward stepwise method

06 May 2016, 07:25

Hi ! I am writing to you to see if you could help me in an emergency way... for major modifications of one of my paper!

I've done a linear mixed model to test the effectiveness of a specific care of Alzheimer's patients on cognitive decline (using mmse scale). Patients come from 2 cohorts (one with the specific care and the other one without). This is my model (adjusted for several potential confounders). Patients were followed-up every 2 years. (time : 0,2 or 4 years).

xi:xtmixed mmse cohort i.age_cl sex lifestyle i.comorbidities adl_dependant i.time || id : i.time

The interaction cohort#time was significant and so I've provided estimations of cognitive decline according to the initial cohort (specific care or not).

But I also have to identify predictors of cognitive decline whatever the cohort of interest (whatever the type of care).

I am a little bit lost to answer this question.

I don't know if I have to :

First possibility : 1) test in bivariate analyses potential confusing factors 2) include them (level of significance 0.2) then 3) do a multivariate analyses with a backward stepwise method to see which factors will "stay" in my model 4) test the interactions with time of only these potential confounding factors to identifiy predictors of cognitive decline

Second possibility : 1) test in bivariate analyses potential confusing factors 2) include in my multivariate analyses only those which are significant at level 0.2 3) test the interaction of each one with time while adjusting for other ones. If the interaction is significant, the factors was a predictor of cognitive decline over time. 4) not do backward stepwise then.

I know that this question is not just about stata using but I sincerely need help to submitt my paper again within 3 days...

Would someone help me?

Thank you so much,

Kind regards,

Laure Rouch, PhD student
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30155
#2

06 May 2016, 10:18

There is much in your post that I find confusing. So I'm going to make a lot of assumptions here, which I hope will prove helpful and not lead you down a path that does not lead towards your goals.

The first confusing point is that you show code

Code:

xi:xtmixed mmse cohort i.age_cl sex lifestyle i.comorbidities adl_dependant i.time || id : i.time

and then comment that the cohort#time term was significant. But that code doesn't have any cohort#time term! So I'm just going to assume that this isn't really the code you ran. (And actually, I hope it isn't, because there are other ways in which it does not appear to make sense.)

So I assume you have two cohorts distinguished by the occurrence of some exposure (perhaps a treatment) and they are followed at times 0, 2, and 4 for cognitive decline, as measured on the mini-mental status exam (mmse). For some reason, perhaps because the cohorts are not randomized, you have elected to adjust the analysis for some potential confounding variables, age_cl (I'm guessing that's some age group), sex, lifestyle (which you have somehow operationalized into either a continuous measure or a dichotomy), comorbidities, and dependency in activities of daily living (adl_dependant). Your analytic strategy is to determine whether the cohorts exhibit different trajectories of mmse over time.

Assuming you are using the current version of Stata (14.1), -xtmixed- is now called -mixed-. More important, assuming you are using any recent version of Stata, you should abandon -xi:- and use factor variable notation. See -help fvvarlist- for details of how it works. Not only is -xi:- obsolete, using it prevents you from taking advantage of the very useful -margins- command after you estimate your model. And in this case, using it has fooled Stata into carrying out a probably meaningless analysis of random slopes on indicator variables for your time periods.

So here is what I think your analysis should look like, more or less:

Code:

mixed mmse i.cohort##c.time i.age_cl i.sex lifestyle i.comorbidities i.adl_dependant || id: time

If lifestyle is a dichotomous variable, then it would be best to use i.lifestyle instead of lifestyle.

Now this analysis will give you an interaction term for cohort#time and that will be the estimate of your intervention effect. After that you can run -margins cohort, at(time = (0 2 4))- to get expected values of mmse (adjusted for the other variables) in each cohort at each time period. You can also get expected cohort-specific slopes with -margins, dydx(time) at(cohort = (0 1))- [assumes cohort is coded 0 1; substitute actual numeric codes used if that is not the case.]

Notice that in this analysis time is treated as a continuous variable, and the model includes random slopes.

Now, if you really feel that assuming that the trajectories of mmse over time will be linear is simply too restrictive or wrong, you can get that, but without random slopes, as follows:

Code:

mixed mmse i.cohort##i.time i.age_cl i.sex lifestyle i.comorbidities i.adl_dependant || id:

You cannot specify random slopes for this as id:i.time, because the random effects part of the -mixed- command does not accept factor variable notation. You can, from a purely syntactic perspective, specify it as id:R.time. Stata will accept that syntax--but the model will not converge because it is unidentified. The most you can do in this situation is specify id level random slopes on either the segment of the trajectory between times 0 and 2, or between times 2 and 4. Once you do that, the rest of the trajectory is determined (that is why the model is unidentified when you specify id:R.time). So you could, for example, do this:

Code:

// INDICATOR FOR TIME 2 PERIOD // FACTOR VARIABLES NOT ALLOWED IN RANDOM EFFECTS SPECIFICATION gen time2 = 2.time mixed mmse i.cohort##i.time i.age_cl i.sex lifestyle i.comorbidities i.adl_dependant || id: time2

This specifies a random slope between times 0 and 2 (and then, implicitly a complementary random slope from time 2 to time 4).

My guess is that treating time as a continuous variable is the better approach, but at least now you have the syntax to do it either way and get results.

I haven't even gotten to the question you wanted answered yet. Here we go:

You have been asked to identify variables (presumably other than the ones in your model) that are predictors of cognitive decline. It's unclear why this is being asked of you, but I will first guess that there is a concern that there are other confounding ("confusing")* variables that your advisors are concerned have been omitted from your model, resulting in possible bias. So I'll set out here an outline of the approach I recommend for this. Before getting into the details, let me just emphasize that variable selection is a thorny problem, and there is no useful automatic solution to it. It is necessary to apply scientific judgment, and reasonable people can disagree about those judgments. It is also important to remember that the best approach may depend on what your model is being used for. Variables that are important for understanding causal relationships may differ from those that are important for prediction on population or individual levels.

Confounding by a variable X would occur if X has a different distribution in your two cohorts, and also is associated with the mmse outcome. The first thing to understand is that confounding is a sample-level phenomenon. Even if this confounding relationship exists in a population at large, if you did a study with a design that matched the cohorts on X, you would break the confounding and it would not be an issue for your analysis. So confounding is a phenomenon that exists (or not) in your sample. Which means that p-values are completely irrelevant to determining what is a confounding variable, because p-values are about making inferences about populations. So I would start by looking at how strongly each of your candidate variables is associated with mmse using bivariate regression coefficients for continuous X variables, and differences in group means of mmse across categories of discrete X variables. There are no prior thresholds that can be used to decide how big a correlation or group mean difference is large enough to consider as indication of potential confounding. Rather it is contingent on your specific research goals. The basic issue is, would including X appreciably change the results of my earlier modeling. So if, for example, your cohort#time interaction coefficient is some value, say E, a difference in mean mmse across groups of X that is of a magnitude that is comparable to E is quite important, whereas a magnitude that is minuscule by comparison to E is not. Once you decide which X's have substantial relationships to mmse in this sense, you can then go through them and see how different their distributions in the two cohorts of your study are. So again, mean differences across cohorts for continuous X, and differences in probability of values of X across cohorts matter. And again, the question is whether the difference is large enough to materially affect your earlier analysis. So, again to give an example, if X is continuous, and its bivariate regression coefficient with mmse is R, and if the difference in means between the two cohorts is D, the question is how big is R*D compared to E. If they are of comparable magnitude, X is a confounder that needs to be added to your analysis.

Now, it is possible that you are not being asked to identify predictors of cognitive decline for the purposes of refining your model to reduce confounding bias. Perhaps, my second guess, you are just being asked to do an exploratory data analysis, checking out new variables to see if they appear to be related to cognitive decline or not. If this is your situation, it is important to bear in mind the limitations of this kind of work: it is not a suitable approach to test hypotheses, only for generating hypotheses that can be tested in a new study. I would treat each candidate variable X separately, adding X##c.time to the earlier model. (Reminder: if X is a continuous variable you need to write this as c.X##time.) Your estimate of the effect of X on cognitive decline will be the coefficient of X#time. I would consider then doing a final model which includes all the X's (again entering X##c.time or c.X##c.time as the case may be) that exceeded my threshold when entered separately. For the purposes of hypothesis generation, threshold p-values of 0.10 or 0.20 are commonly used. I think 0.25 is reasonable as well. My approach not only begins here, but it ends there. I strongly recommend against stepwise backwards elimination. As this post is already rather long-winded, rather than explaining why I deprecate stepwise approaches, I refer you to http://www.stata.com/support/faqs/st...ems/index.html, where no less a luminary than Frank Harrell explains its numerous drawbacks.

I hope these comments are helpful.

*The standard terminology is confounding, not confusing. But confusing is an interesting substitute word in this context as found and fuse both trace back to the Latin fundere "to melt," the past participle of which is fusus. Both words refer to the melting together of distinct things (metaphorically in the present context).
1 like
Comment
laure rouch

Join Date: May 2016

Posts: 25
#3

06 May 2016, 12:44

Dear Clyde Schechter,
Thank you so much for you answer and the time you have taken to answer me. I really appreciate it.
I acknowledge that my last message was not extremely clear. I’ve read with a lot of attention your message and I will treat time as a continuous variable. I find it better indeed.
Regarding the potential predictors of cognitive decline, I will give you additional details. In my research paper, I compared survival at 4-year follow-up and institutionalization according to the care strategy (standardized and specific follow-up in memory centers versus usual care in the real life) using Cox models. And I also investigated associated factors with death and institutionalization.
The reviewers asked me to do additional analyses and investigate the effectiveness of a standardized and specific follow-up in memory centers of cognitive decline and loss of autonomy over time. (I am not at all familiar with this kind of model…). Their comments are pretty unclear but I have understood that they were also waiting for (as I have done for death and institutionalization) associated factors with cognitive decline and dementia.
This is the reason why I want to investigate if age, sex, living arrangement, financial incomes, level of education, BMI, ADL, IADL, comorbidities and treatments would be associated factors with cognitive decline.
But, these factors are my potential confounding factors. So, yes, it is a kind of exploratory data analysis but with the potential confounding factors that I used in all my analyses (the only ones which were common to the 2 cohort studies).
Actually, the model that I wrote to investigate the effectiveness of a specific follow-up in memory centers on cognitive decline was:
xi:xtmixed mms cohort##time i.AgeP_cl sexe i.habit i.revenu_cl i.dipniv0_cl2 i.bmi0_cl2 i.CDRtot depiadl0 dependantADL meddem0 || id : i.time
(These potential confounding factors are those significant at a 0.25 level in the bivariate analyses).
(I will modify it using time as a continuous variable).
If I understand the last paragraph of your message, what I should do is:
Add each candidate separately adding ##c.time to the previous model. I guess that it is while adjusting for other potential confounding factors but without the interaction cohort##c.time like :

mixed mms cohort AgeP_cl##c.time sexe i.habit i.revenu_cl i.dipniv0_cl2 i.bmi0_cl2 i.CDRtot depiadl0 dependantADL meddem0 || id : i.time
mixed mms cohort i.AgeP_cl sexe##c.time i.habit i.revenu_cl i.dipniv0_cl2 i.bmi0_cl2 i.CDRtot depiadl0 dependantADL meddem0 || id : i.time
So, look at the p-value of the interaction term. If it is <0.2-0.25, select it.

And so on to test all the potential interactions with time…….

And then, do a final model including all the significant interaction terms (just found) plus the other potential confounding factors (at a 0.25 in the bivariate analyses) even if they were not significant in an interaction with the time.

And declare “associated factors with cognitive decline”, factors in an interaction term still significant in this final model?

Is that what you meant?
Because it seems that when I do that, ok when I include into the final model all the interaction term with time significant plus the other potential confounding factors, the majority of the interaction terms with time are no longer significant. It wouldn’t be a problem except the fact that my interaction cohort#time is no longer significant.
And when I did my first model
(xi:xtmixed mms cohort##time i.AgeP_cl sexe i.habit i.revenu_cl i.dipniv0_cl2 i.bmi0_cl2 i.CDRtot depiadl0 dependantADL meddem0 || id : i.time)
To test the effectiveness of specific care, it was significant…so I did all my discussion with this information.

I also would like to ask you how could I test the global interaction when I use dummy variables?
Because for example when I have this interaction in my model: i.age##c.time and a significant p-value for the first modality and a non-significant p-value for the other one, I’d like to test the overall interaction to know if I have to maintain it in my model.

Once again, I’d like to thank you, very sincerely, for your help.
I appreciate it so much.
Kind regards,
Laure ROUCH
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30155
#4

06 May 2016, 13:14

Well, what you show in #3 accord with what I outlined as the way to do exploratory analysis for new predictors. It differs from the approach I outlined as a way to identify confounding variables. From the description of the reviewers' comments it isn't clear to me which they really want. It may be that your summary of the reviewers' comments has left out some clues that might shed light on this. Or the reviewers may not have been clear in communicating what they wanted (which wouldn't be the first time). For that matter, it wouldn't be the first time where the reviewers aren't really clear in their own minds what they want!

If the previously significant interaction effect for treatment cohort disappears when you add in these additional terms, it typically means that the treatment effect you initially observed is, at least in part, attributable to the cohort differences in these other confounding variables. Before reaching that conclusion, you should verify that the number of observations analyzed in each model is the same: sometimes adding in new variables results in sample attrition due to missing values of newly included variables. That said, I think it is a mistake to focus on statistical significance as a yes/no result. First, the 0.05 cutoff that is usually used for statistical significance is arbitrary, a historical magic number. So if your p-value went from 0.04 to 0.06, it doesn't mean that anything important has changed. I think it is better to look at the effect size itself, which, in these models is the coefficient of i.cohort#c.time. While adding potential confounding variables to the model reduces confounding bias, it sometimes has the effect of increasing residual variance, so you can end up with the same effect size being "significant" in the unadjusted model and not in the adjusted one, particularly if the effect of the confounding variable itself is small (whether "significant" or not). So in interpreting these findings, I would focus on how much the actual coefficient of i.cohort#c.time changes when these confounders are added. It may be little changed; the confidence intervals from the different models may be extensively overlapping.

Or not. If you find that your coefficient of i.cohort#c.time indeed shrinks substantially towards zero when the confounders are added, then it suggests that your initial effect of cohort was simply attributable to differences between the cohorts on these confounding variables, and not to the different treatments of the cohorts.

Now the easier part: how to test for global interaction effects. Rather than indulging in the complications of a mixed model to explain a simple syntactic point, run this to see it in action on a simple example:

Code:

sysuse auto, clear regress price i.rep78##(c.mpg c.headroom) testparm i.rep78#c.headroom testparm mpg headroom i.rep78 testparm i.rep78#c.mpg i.rep78#c.headroom

Last edited by Clyde Schechter; 06 May 2016, 13:18. Reason: Simplify example for global interaction effects testing.
Comment
laure rouch

Join Date: May 2016

Posts: 25
#5

26 May 2016, 03:56

Dear Clyde Schechter,

Even if it is a little bit late, I wanted to kindly thank you again for your advices and your help while I was working on my manuscript.

Thank you so much. I really appreciated you help.

Best regards,

Laure ROUCH
Comment

Announcement

Predictors of cognitive decline : linear mixed models with repeated data, interactions, and backward stepwise method

Comment

Comment

Comment

Comment