Syntax check for MLM with continuous dependent and independent variable and assessing for 2nd degree interaction 1) by time, and 2) by group

Rosemay Remigio-Baker

Join Date: Jan 2019

Posts: 6
#1

Syntax check for MLM with continuous dependent and independent variable and assessing for 2nd degree interaction 1) by time, and 2) by group

17 Jan 2019, 12:01

I am interested in doing:

(1) MLM with a continuous dependent and independent variable,
(2) Assessing interaction by time,
(3) Assessing interaction of above by group

I wanted to make sure that I am proceeding correctly, both programmatically and interpretation-wise.

**********************
To start, I have the following syntax to evaluate (1) and (2) above:

xi: xtmixed sym sat time sat_time || ID: time, variance covar(un) mle

where sym (my continuous dependent variable) = disease symptom level (higher numbers means more symptoms),
sat (my continuous independent variable) = satisfaction with healthcare (Likert scale of 0 strongly disagree to 4 strongly agree),
time = time variable (year 1-year 5), and
sat_time = interaction between satisfaction with healthcare and time
ID = identification number of participant

If this is correct, and if there is significant interaction by time (i.e., sat_time p<0.05), how would I interpret the results?

I want to know how satisfaction with healthcare relates to disease symptom level, and whether or not this varies by time. If it varies by time (i.e., sat_time p<0.05), what would the coefficient tell us? Is this the unit increase in disease symptom level over time per unit increase of satisfaction level with healthcare?

*****************
To assess (3), I added to the previous syntax in the following:

xi: xtmixed sym sat time grp sat_time sat_grp time_grp sat_time_grp || ID: time, variance covar(un) mle

where grp = intervention group (1 vs. 0),
sat_time = interaction between satisfaction and time
sat_grp = interaction between satisfaction and group
time_grp = interaction between time and group
sat_time_grp = interaction among satisfaction, time and group

If there is significant interaction by group (i.e., sat_time_grp p<0.05), how would I determine the direction of association for each group and whether each 'main effect' is significant for each group? Would this be the following syntax:

lincom sat_time /*group=0*/
lincom sat_time + sat_time_grp + /*group=1*/

I assume the interpretation remains the same as above but tailored to each group where there is significant 'main effect', with a negative coefficient interpreted as a unit decrease in symptom level over time per unit increase in satisfaction level with healthcare. Your assistance and confirmation would be much appreciated.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30173
#2

17 Jan 2019, 14:21

First, on the assumption that you are not using an antique version of Stata (i.e. earlier than version 13) let's bring your syntax up to date, as the modern, simpler syntax will perhaps make things clearer, and will enable you to make use of the -margins- command in interpreting your results.

Code:

mixed sym c.sat##c.time || ID: time, covar(un)

The command -xtmixed- has been renamed -mixed-, and reporting random effects in the variance metric, and using maximum likelihood estimation are now the defaults. Note also that by using factor variable notation you do not need to hand-code an interaction variable: Stata does it for you "on the fly" and does it in a way that does not leave any traces in your data set.

The coefficient of sat#time in the output will represent the rate of change (could be increase or decrease, don't prejudge) per unit time in the marginal effect of sat on sym.

The second model would be

Code:

mixed sym i.grp##c.sat##c.time || ID:time, covar(un)

The three-way interaction coefficient will give you an estimate of the extent to which the sat:sym relationship follows different time trajectories in the two (or more) groups.

It isn't clear to me what you mean by the "main effects" in this model. You might be interested in knowing, about the marginal effect of sat on sym at different time points in each group. That could be gotten with:

Code:

margins group, dydx(sat) at(time = (1(1)5))

and graphed with

Code:

marginsplot, xdimension(time)

The particular -lincom- commands you have shown estimate, separately in each group, the rate of change in the difference in the marginal effect of sat on sym per unit time. Perhaps that relates to your research goals, though it does not strike me as something of obvious interest.

The code you have shown is quite readable as you have done it. But in general, it is easier on the reader if you post Stata code or Stata output using code delimiters. If you are not familiar with code delimiters, please read Forum FAQ #12.
Comment
Rosemay Remigio-Baker

Join Date: Jan 2019

Posts: 6
#3

17 Jan 2019, 17:10

Thank you for this helpful information.

For the first part (2-way interaction), I am interested in assessing the relationship between satisfaction and symptoms over time which would be obtained from sat#time. An interpretation for, say, a value of 0.35 would be 'a unit increase in satisfaction level is associated with 0.35 unit increase in symptom level over time'. I think we are in agreement with that. Please correct me if I'm wrong.

For the second part (3-way interaction), I am interested in determining whether the relationship between satisfaction level and symptom level over time significantly varies by group, and, if it does, I would like to know the relationship within each group (an estimate and CI for that estimate). Each group would then have their individual interpretation as above, given significance. I'm not really interested in estimates at each time point, but, rather, an estimate that takes into consideration the time trajectory (i.e., increase vs. decrease over time). Would the 'lincom' command still hold? How would this look like using the 'margins' command?

Thank you again.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30173
#4

17 Jan 2019, 18:07

Well, much hangs on what you mean by "the relationship between satisfaction and symptoms over time." You are using, in the first model, an interaction between sat and time. This means that you believe that the slope of the symptom:satisfaction line is assumed to depend, linearly, on time. So there isn't, in any sense I can perceive, any such thing as "the relationship between satisfaction and symptoms over time." There is a different relationship between satisfaction and symptoms at different points in time.

A coefficient of 0.35 for the sat#time interaction term means that with every unit of time that passes, the slope of the symptoms vs satisfaction line increases by 0.35 units of symptom intensity/unit of satisfaction. (If you prefer, you can interpret it the other way: with every unit increase in satisfaction, the slope of the symptoms vs time line increases by 0.35 units of symptom intensity/unit of time. These are equivalent in this model.) One way of thinking about it is to imagine you drew a regression line relating symptoms to satisfaction just for time 1. Let's imagine that the line has a negative slope, that is, it goes down to the right (so, with more satisfaction there are fewer symptoms.) Then on the same graph, imagine drawing another regression line relating symptoms to satisfaction just for time 2. The second line would be less steep than the first: the difference in their slopes would be 0.35. If you then added a third line for just time = 3, that would be still less steep. In fact, if you were to carry on long enough, at some sufficiently late time, the line would actually start to point up to the right (although perhaps that time might not be reached within your data.) To actually see this on the screen run:

Code:

mixed sym c.sat##c.time || ID: time, covar(un) margins sym, at(time = (0(1)5) sat = (0(1)4)) marginsplot, xdimension(time)

That is what your model does, and the extent to which the lines rotate* as you move from one time to the next is what that interaction term tells you about.

When you add another dimension to the interaction by throwing in grp, you could envision having two graphs that look like what I just described, one for each group. The coefficient of the three way interaction term would quantify the extent to which the amount by which the lines rotate within each of those graphs differ.

*I use the term rotate here because I want you to focus on the way the slope changes from one line to the next, but the lines may well, in addition to changing direction as time marches on, also shift their starting points upwards or downwards (depending on the sign of the coefficient of the time variable.) So what you say may not be pure rotation, but a combination of rotation and vertical shift.
Comment

Rosemay Remigio-Baker

Join Date: Jan 2019
Posts: 6

22 Jan 2019, 16:57

Thank you again. This makes perfect sense.

For the syntax:

Code:

margins sym, at (time = (0(1)5) sat = (0(1)4))

If my outcome contains noninteger values, what command would you recommend that would be analogous to 'margins'?

Also, should the syntax for time above be

Code:

(time=(1(1)5))

for a time variable with values 1-5? If not, why use '0'?

If I have the following output:

Code:

. margins grp, dydx(sat) at(time = (1(1)5))

Average marginal effects                        Number of obs     =        300

Expression   : Linear prediction, fixed portion, predict()
dy/dx w.r.t. : sat

1._at        : time            =           1

2._at        : time            =           2

3._at        : time            =           3

4._at        : time            =           4

5._at        : time            =           5

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
sat          |
     _at#grp |
        1 0  |  -.0923235   .1002713    -0.92   0.357    -.2888516    .1042046
        1 1  |   .3383372   .1329871     2.80   0.012    .1223127    .3989871
        2 0  |  -.1357978   .0989701    -1.37   0.170    -.3297757    .0581801
        2 1  |   .0210919   .1312542     0.16   0.872    -.2361616    .2783455
        3 0  |   -.179272   .1080221    -1.66   0.097    -.3909914    .0324473
        3 1  |  -.0961533   .1428796    -0.67   0.501    -.3761922    .1838857
        4 0  |  -.2227463   .1252013    -1.78   0.075    -.4681363    .0226437
        4 1  |  -.2133985   .1650646    -1.29   0.196    -.5369192    .1101222
        5 0  |  -.2662206   .1476986    -1.80   0.071    -.5557045    .0232634
        5 1  |  -.3306437   .1942238    -1.70   0.089    -.7113153    .0500278
------------------------------------------------------------------------------

. 
end of do-file

How would you translate dy/dx? For example, if we look at the 2nd row of results with dy/dx=0.3383372 and a p-value of 0.012, is this interpreted as: for group 1 at time point 1, symptom level (sym) significantly increases by 0.34 for every unit increase in satisfaction with healthcare (sat)?

If I was interested in how satisfaction with healthcare (sat) at baseline (e.g., only at the time of enrollment) is related to the symptom trajectory over time, I assume that the syntax described in this communication remains the same for this assessment, correct? In the data, instead of having different satisfaction with healthcare value for each time point per participant, there will be only one value repeated 5 times (due to 5 time points). Is this correct?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30173
#6

22 Jan 2019, 17:18

If my outcome contains noninteger values, what command would you recommend that would be analogous to 'margins'?

-margins- computes an expected outcome. It doesn't matter whether the outcome contains non-integer values; in fact, in most applications it will.

Also, should the syntax for time above be
Code:
(time=(1(1)5))

for a time variable with values 1-5? If not, why use '0'?

Yes, the values you use should be values that are a span of the interesting values of time in your data. If 0 does not exist as a value of time in your data, then there is no reason to include 0 in the list.

How would you translate dy/dx? For example, if we look at the 2nd row of results with dy/dx=0.3383372 and a p-value of 0.012, is this interpreted as: for group 1 at time point 1, symptom level (sym) significantly increases by 0.34 for every unit increase in satisfaction with healthcare (sat)?

Correct.

If I was interested in how satisfaction with healthcare (sat) at baseline (e.g., only at the time of enrollment) is related to the symptom trajectory over time, I assume that the syntax described in this communication remains the same for this assessment, correct? In the data, instead of having different satisfaction with healthcare value for each time point per participant, there will be only one value repeated 5 times (due to 5 time points). Is this correct?

Correct.
Comment
Rosemay Remigio-Baker

Join Date: Jan 2019

Posts: 6
#7

23 Jan 2019, 09:27

Thank you again.

When I run the margins command:

Code:

. margins sym, at(time = (1(1)5) sat = (0(1)4))

I receive the following error:

Code:

sym: factor variables may not contain noninteger values

My symptom level variable (sym) contains integers, fractions and negative values. Any suggestion on how to go around this error. Your suggestion of creating an illustration will be terrific to do:

Code:

marginsplot, xdimension(time)

I assume satisfaction with healthcare (sat) as an independent variable can contain integers, fractions and negative values as well. Is that correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30173
#8

23 Jan 2019, 09:48

Code:

margins sym, at(time = (1(1)5) sat = (0(1)4))

is incorrect. Remove sym; you cannot name the outcome variable in -margins-. That was my error in #4 and you have propagated it along. Sorry about that.

I assume satisfaction with healthcare (sat) as an independent variable can contain integers, fractions and negative values as well. Is that correct?

Yes. sat is specified in the model as c.sat, a continuous variable, so it is not subject to the non-negative integer constraint.
Comment
Rosemay Remigio-Baker

Join Date: Jan 2019

Posts: 6
#9

24 Jan 2019, 11:34

Thank you. No apologies needed. You have been of tremendous help.

I assume that the addition of covariates within the mixed command would not change any of the syntax of the margins command or how results are interpreted - other than an addition of 'adjusted for covariates' in the explanation, correct?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30173
#10

24 Jan 2019, 15:00

Correct. The results, of course, may change. Possibly a lot if the covariates are strongly confounding. But the syntax and interpretation will be unchanged, other than "adjusted for covariates."
Comment
Rosemay Remigio-Baker

Join Date: Jan 2019

Posts: 6
#11

28 Jan 2019, 17:50

Thank you.

Just to clarify, in interpreting the graph obtained from:

Code:

mixed sym c.sat##c.time || ID: time, covar(un) margins, at(time = (0(1)5) sat=(0(1)4)) marginsplot, xdimension(time)

is the y-axis simply a measure of outcome (i.e., is the y-axis title 'Linear prediction, fixed portion' automatically obtained from the syntax above equivalent to the title 'Level of disease symptom' in this example)?

Also, the graph is automatically titled as 'Adjusted predictions with 95% CIs'. What is meant by 'adjusted'?

************
If I am looking at a graph from the following syntax:

Code:

mixed symp c.sat##c.time || ID:time, covar(un) margins , dydx(sat) at(time = (0(1)5)) marginsplot, xdimension(time)

the y-axis is automatically titled, 'Effects on linear prediction, fixed portion'. Is this equivalent to saying the 'Difference in the level of disease symptom', again using this example? Of course this would be per unit increase in satisfaction with health care over time.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30173
#12

28 Jan 2019, 18:08

is the y-axis simply a measure of outcome (i.e., is the y-axis title 'Linear prediction, fixed portion' automatically obtained from the syntax above equivalent to the title 'Level of disease symptom' in this example)?

Also, the graph is automatically titled as 'Adjusted predictions with 95% CIs'. What is meant by 'adjusted'?

Yes, it is a measure of outcome. It is the model's predicted value of symp based on the values of sat and time, not including the random effects or residuals.

In this particular model, "adjusted" doesn't mean anything, but if you had additional covariates in the model, it would mean that the results have been adjusted so that everything shown was adjusted to the distribution of those covariates in the same sample. If, for example, you had included i.sex in the model, then the calculations would be done for the values of time and sat but with any differences in the distribution of sex at different times or with different values of sat leveled out. The calculations would be done as if the distribution of sex were the same for all values of time and sat.

he y-axis is automatically titled, 'Effects on linear prediction, fixed portion'. Is this equivalent to saying the 'Difference in the level of disease symptom', again using this example? Of course this would be per unit increase in satisfaction with health care over time.

Correct.

Do keep in mind that the -marginsplot- command accepts nearly all options available with -graph twoway-, so you can change the axis titles and other aspects of the graphs' appearance pretty much any way you like.
Comment

Announcement