Interpreting mixed linear model with interaction output in STATA

Martin Mueller

Join Date: Jun 2017

Posts: 77
#1

Interpreting mixed linear model with interaction output in STATA

26 Jun 2017, 09:05

Dear all,

I fitted a mixed-effects models in stata for the longitudinal analysis of bmi (body weight index) after differnet type of surgery to compare the course of two different groups (case and control), with random intercepts and random slopes, after documenting, with a likelihood ratio test, that this model had better fit than simpler ones (see Figure below).

Regarding time, a quadratic term was added, to account for a non-linear association of the bmi course over time with 95% CI and predictor variables age, sex and type_of_surgery. I am interested on the interaction of time and beeing a case.

The command: mixed c.bmi c.time i.case c.time#c.time i.case#c.time c.age i.sex i.type_surgery || pid: time, stddev base

The STATA Output is:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0: log likelihood = -4635.5813
Iteration 1: log likelihood = -4635.5812

Computing standard errors:

Mixed-effects ML regression, Number of obs = 1654
Group variable: pid, Number of groups = 277

Obs per group: min = 2
avg = 6.0
max = 7

Wald chi2(8) = 3109.11
Log likelihood = -4635.5812, Prob > chi2 = 0.0000

My questions is about the interpretation of the coefficients which I tried to transfer from this post https://www.statalist.org/forums/for...xtmixed-output :

time: The coefficient in the **controls** that describes the bmi course over time is different significantly different from zero, indicating that there was a change over time in the BMI of the controls independently of the course of the cases. (independantly of sex, type_of_surgery and age)

case: There is no evidence against the null hypothesis (p=0.569) of difference between the BMI of cases and controls **at baseline** (time = 0 months) independantly of sex, age, type_of_surgery.

case*time: There is an interaction effect between being a case and time in regard to bmi, e.g. after 24 months there is a difference in the bmi of cases compared to controls of **“0.3900 + 24 * 0.0902 = 2.554”** independantly for type_surgery and sex and age.

sex: at baseline (0 months) in the control group there is no difference in bmi of woman compared to man

age: at baseline (0 months) there is some evidence that the bmi in the control group reduces about 0.0608 kg/m^2 per 1 increase of age i.e. 10 years: 0.608 smaller BMI.

type_surgery: At baseline in the control group people having a smaller mean BMI in the Bypass group compared to the Banding group (-5.3)

**My questions:**

1. Are these interpretations correct?
2. One further question was, if it is necessary to use "reml" (restricted maximum likelihood option) or if "mle" (default, maximum likelihood estimation) is fine. The advantage of "mle" is the use of likelihoodratio test to justify the used model over others (e.g. the quadratic term for time) as far I understood. Using the one or the other does not change much.
3. Is the Figure adequate and shows that cases have a different course than controls?

Hope you can help me. Thank you in advance!
Kind regards
Martin
Tags: None

1 like
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#2

26 Jun 2017, 09:20

The model is mis-specified and you should not interpret it. It is incorrect because you have interacted case with the linear time variable, but not with the quadratic term. Similarly, it is a mis-specification to have linear time, but not quadratic time in the random slopes. Now, you have unfortunately stumbled on one of the few situations where Stata does not support factor variable notation. So you have to redo this as:

Code:

gen timesq = time*time mixed c.bmi i.case##c.(time timesq) c.age i.sex i.type_surgery || pid: time timesq, stddev base

Once you have done that, bear in mind that no interpretation whatsoever can be attached to the coefficient of time by itself (nor i.case#time). You must instead look jointly at time and timesq, (and case#time and case#timesq). Hence, in the hypothesis testing mode, you would do things like -test time timesq- and look at the joint significance. Neither time nor timesq by itself has any meaningful interpretation.

Your interpretation of the case variable's coefficient would be essentially correct after fixing the model. Ditto for sex, age, and type_surgery.
Comment
Martin Mueller

Join Date: Jun 2017

Posts: 77
#3

26 Jun 2017, 19:36

Dear Clyde,

thank you so much for your super fast response and your very valuable help by fixing my model! I appeciate that!

I used your code now, but still strugeling with some things:

Code:

mixed c.bmi i.case##c.(time timesq) c.age i.sex i.type_surgery || pid: time timesq, stddev base

The output then is:

If I run afterwards the following code for graphical output of the model:

Code:

margins i.case, at(time=(0(6)36)) marginsplot, xdim(time) recastci(rarea)

The predictive margins are two straight lnes (case and control), but not cubic functions as before and although there should be a quadratic approximation, no?

My questions:
1. Is the command margins/marginplot not useful any more with this syntax?
2. Furthermore, I dont't understand the use of "test time timesqr".
3. And at the end: Is the following statement/interpretation of the model wrong?:
"There is strong evidence against the hypothesis of no difference in the course of bmi between cases and control over time (Is there a single p-value I can state?).
After 24 months for instance the predicted/modeled difference of the bmi between a case and a control is 0.07 + 24 * 0.157 -0.0018 * 24^2 = 2.8 independantly of sex, age and type_surgery."

I am looking forward to your response!

Best wishes from Bern, Switerland.
Martin
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#4

26 Jun 2017, 23:57

Martin:
q1) -margins- is stil useful. It simply reflects the changes in your model specification;
q2) -test time timesq- checks whether both the linear and the squared terms of -time- differ from zero0;
q3) before any interpretation, I would run your code without the squared term for i.case##c.(time timesq) and see if -margins- return a quadratic relationship between -bmi- and -time-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Martin Mueller

Join Date: Jun 2017

Posts: 77
#5

27 Jun 2017, 03:42

Dear Carlo,

thank you very much for your responds!

If I run the mixed model without the interaction term i.case##c.(time timesq), -margins- and -marginplot- afterwords give me a quadratic function (like in my first post), only if I use an interaction c.time#c.time. If I replace the interaction term with timesq as Clyde suggested no cubic function is returned (see my last post):

Code:

gen timesq = time*time mixed c.bmi i.case c.time#c.time c.time c.age i.sex i.type_surgery || pid: time timesq, stddev base // margin and marginplot return a square function mixed c.bmi i.case c.timesq c.time c.age i.sex i.type_surgery || pid: time timesq, stddev base // margin and margin plot return a linear function

If I use the following workarounds to get to my clyde suggested models

Code:

mixed c.bmi i.case c.time#c.time c.time c.timesq#i.case c.time#i.case c.age i.sex i.type_surgery || pid: time timesq, stddev base

or

Code:

mixed c.bmi i.case c.time#c.time c.time c.(time timesq)#i.case c.age i.sex i.type_surgery || pid: time timesq, stddev base

the margin command is returning empty values only and the marginplot is not drawing anything.

I think it is weired, no?

Thank you for your help!
Martin
Comment
Martin Mueller

Join Date: Jun 2017

Posts: 77
#6

27 Jun 2017, 04:01

Appendix:

If I run one of my workarounds the following message appeared:

Code:

note: 1.case#c.timesq omitted because of collinearity

So, if i then run the model again with manually put out the collinear interaction term case#c.timesq

Code:

mixed c.bmi i.case c.time#c.time c.time c.time#i.case c.age i.sex i.type_surgery || pid: time timesq, stddev base margins i.case, at(time=(0(6)36)) marginsplot, xdim(time) recastci(rarea)

the model works fine and the output with - margin - and - marginplot - is cubic. My most important question: Can I use this as my final model?

I have a probable explanation for the two things, also I am not sure about them.

1. If I use c.timesq insead of c.time#c.time the margin command must be changed (because the coeff. output of the model are totally the same)
2. although STATA is stating that i.case#c.timesq omitted because of collinearity, it's somehow in the margins command, so I need to tun the model as I did.

What do you guys think?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#7

30 Jun 2017, 09:13

Returning to this thread after several days away, let me go all the way back to #3.

1. Is the command margins/marginplot not useful any more with this syntax?
2. Furthermore, I dont't understand the use of "test time timesqr".
3. And at the end: Is the following statement/interpretation of the model wrong?:
"There is strong evidence against the hypothesis of no difference in the course of bmi between cases and control over time (Is there a single p-value I can state?).
After 24 months for instance the predicted/modeled difference of the bmi between a case and a control is 0.07 + 24 * 0.157 -0.0018 * 24^2 = 2.8 independantly of sex, age and type_surgery."

1. No, you can't use -margins- and -marginsplot- to get quadratics unless the quadratic is represented using factor variable notation. -margins- does not know that timesq is supposed to be the square of time. It only knows that c.time#c.time is the square of time. That's why your -margins- output is wrong. The only reason I told you to use the timesq approach is because you also wanted to have random "slopes" on time, and, unfortunately, you cannot use the c.time#c.time notation in the random portion of -mixed-. (They haven't started a wish list for Stata 16 yet, but when somebody does, my first post to it will be about this issue.) So you are, unfortunately, forced to choose between being able to use -margins- and being able to get random slopes with quadratic models.

2. -test time timesq- tests the joint null hypothesis that the coefficients of time and timesq are both zero. In a quadratic model this is the appropriate test for a time effect. Because if either time or timesq has a non-zero coefficient, then there is a time effect. Testing either one alone doesn't do it.

3. I can't answer that from what you've shown. The answer to that question is the output from -test time timesq-. And if you want to quote a single statistic (though I can't think of any good reason to do that), you can quote the result from that. (Well, sort of: p-values aren't really measures of strength of evidence, not in this context, nor in any other.)

Moving ahead to #5, those "workarounds" don't work. You can't use c.time##c.time (or, equivalently, c.time c.time#c.time) in the fixed portion and then use time timesq in the random portion of the model: Stata will not realize that timesq corresponds to timesq, and consequently your random effects for timesq will be incorrectly estimated. So, no, you can't use this as your final model: it's actually wrong.

As for the colinearity message you got in #6, I don't really know what to make of that. It's hard for me to figure out how that happened. In any case, as the model underlying it is incorrect, I don't think either of us should spend much time puzzling over that problem.

So, here's how you can "have your cake and eat it, too." Run your model the way I suggested in #2. That gives you your desired results with regard to random slopes. Test your time effect with the command -test time timesq-. But don't run -margins- after that. Now, rerun your model as:

Code:

mixed c.bmi i.case##c.time##c.time c.age i.sex i.type_surgery || pid:, stddev base

This uses factor variable notation, but omits the random slopes. Now run -margins- and -marginsplot- on this. For the statistics you are calculating with -margins- and graphing with -marginsplot-, the random slopes play no role anyway. You will have your correct quadratic plots.
Comment
Martin Mueller

Join Date: Jun 2017

Posts: 77
#8

03 Jul 2017, 06:25

Dear Clyde,

thank you so much! You are a really great help! Guess, I owe you something.

I ran now my final model with appropriate test of a time effect

Code:

mixed c.bmi i.case##c.(time timesq) c.age i.sex i.type_surgery || pid: time timesq, stddev base test time timesq

The further output was "( 1) [bmi]time = 0 , ( 2) [bmi]timesq = 0, chi2( 2) = 1837.66, Prob > chi2 = 0.0000".

After that I followed your instructions and used the following commands to get to my graphical output:

Code:

mixed c.bmi i.case##c.time##c.time c.age i.sex i.type_surgery || pid:, stddev base margins i.case, at(time=(0(6)36)) marginsplot, xdim(time) recastci(rarea)

This worked fine, but I still have some remaining questions:

1.) Interpeting the output of the command "test time timesq", I can state that there is strong evidence for a time effect in the course of the bmi in the control group (p<0.0001), right? (or irrespecitve of case or control group?).

2.) As you mentioned above, I cannot use the coefficient for "case#time" or "case#time_sq" to say that there is a significant difference between cases and controls in the bmi course over time or can I state this because of a significant coefficient of interaction between case and time in the model (respectively do I need to describe the margins output of the other model and check if the 95% CI of the BMI overlap?)

3.) Is it valid to say "For instance after 24 months the predicted/modeled difference of the mean bmi between cases and controls is 0.17 + 24 * 0.09 -0.0008 * 24^2 = 1.9 (or can't I take the outpuf of my final model?) independantly of sex, age and type_surgery (or can I argue with the graphical output and that there is no overlapping of the areas after 24 months)?

4.) The coefficients of the model I used to get to my marginplots (without random slopes) are different from the final model (e.g. coefficient for case 0.24 vs. 0.17). I am wondering if that is correct because you said the random slopes play no role anyway. I just wanted to confirm that the interpretation of the graph is (despite of the difference of the coefficients of the mixed model command) the same as the one of the final model would be?

5.) Do you think the model is fine taking into account that you have many outliers of the sample mean+/- 95% CI (black bars) compared to the model predictions area?

Thank you so much! I really appeciate it!

Best wishes
Martin
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#9

03 Jul 2017, 09:58

1.) Interpeting the output of the command "test time timesq", I can state that there is strong evidence for a time effect in the course of the bmi in the control group (p<0.0001), right? (or irrespecitve of case or control group?).

This is evidence of a time effect in the control group.

2.) As you mentioned above, I cannot use the coefficient for "case#time" or "case#time_sq" to say that there is a significant difference between cases and controls in the bmi course over time or can I state this because of a significant coefficient of interaction between case and time in the model (respectively do I need to describe the margins output of the other model and check if the 95% CI of the BMI overlap?)

If you want to test the null hypothesis that the time course is the same in both groups you can do that with -test case#time case#timesq-.

3.) Is it valid to say "For instance after 24 months the predicted/modeled difference of the mean bmi between cases and controls is 0.17 + 24 * 0.09 -0.0008 * 24^2 = 1.9 (or can't I take the outpuf of my final model?) independantly of sex, age and type_surgery (or can I argue with the graphical output and that there is no overlapping of the areas after 24 months)?

Yes. I would say "adjusted for" rather than "independently of." By the way if you would like a confidence interval around this expected difference, instead of calculating it by hand you can run -lincom 1.case + 24*1.case#c.time + (24^2)*1.case#c.timesq-

4.) The coefficients of the model I used to get to my marginplots (without random slopes) are different from the final model (e.g. coefficient for case 0.24 vs. 0.17). I am wondering if that is correct because you said the random slopes play no role anyway. I just wanted to confirm that the interpretation of the graph is (despite of the difference of the coefficients of the mixed model command) the same as the one of the final model would be?

Sorry, I was wrong to say that you would get the same results. If there were no other variables in the model besides case, time (and time squared), I believe this would be correct. But because there are other variables and they are not independent of time or case, the inclusion or exclusion of the random slopes does make a slight difference. I would not be comfortable presenting the graphs generated by -marginsplot- given that they do represent different results from your actual model. So I think to get a correct graph you need to emulate what -margins- does. So that would look something like this:

Code:

// GET A POSTFILE TO HOLD RESULTS FOR GRAPHS capture postutil clear tempfile tograph postfile handle byte case float time fitted using `tograph' // PRESERVE ORIGINAL VALUES OF KEY VARIABLES clonevar case_original = case clonevar time_original = time clonevar timesq_original = timesq // EMULATE WHAT -MARGINS- DOES forvalues c = 0/1 { foreach t of numlist 0(6)36) { replace case = `c' replace time = `t' replace timesq = `t'*`t' predict fitted, fitted predict rr*, reffects summ fitted, meanonly post handle (`c') (`t') (`r(mean)') drop fitted rr* } } postclose handle // RESTORE ORIGINAL VALUES OF KEY VARIABLES replace case = case_original replace time = time_original replace timesq = timesq_original preserve // BRING IN THE GRAPHING DATA AND GRAPH IT use `tograph', clear reshape wide fitted, i(time) j(case) graph twoway line fitted* time, sort // CUSTOMIZE WITH OPTIONS AS YOU LIKE

Note: Not tested. Beware of typos.

Now, this graph will only give you the central curves, not the confidence intervals around them. I don't actually know how to calculate those confidence intervals here. -predict- does have a stdp option, but it would not take into account the uncertainty in the random slopes.

5.) Do you think the model is fine taking into account that you have many outliers of the sample mean+/- 95% CI (black bars) compared to the model predictions area?

Well, you have to be the judge of that. Are the differences between the predicted and observed values large enough to matter for practical purposes? That's more important than how the 95% CIs and shaded areas overlap.

I would say that, a priori, I would not have chosen a quadratic model for a process like this. After all, the implication of a quadratic model is that as time goes on, the BMI will not only turn around and start to increase, but it will eventually increase without bound and at an ever-accelerating rate. That doesn't seem biologically plausible. Of course, your data don't go out that far in time, so you don't confront that limitation of a quadratic model head on. But even just looking at your data points, it seems to me that a piecewise linear model would make more sense. So there would be an early phase where BMI declines linearly, and then at some point (which looks like it is somewhere around 15 months) it reverses course and starts to increase linearly, but at a more gradual rate than the original decline. And then, finally, at some point in time (which, judging visually, might be somewhere between 30 and 36 months) it levels off. So I would probably have used linear splines for this, rather than a quadratic. You would run into some technical issues with this (e.g., once again, you would have problems getting -margins- to do the right thing as it would not know about the relationships among the spline variables, and you probably don't have enough data to get a decent estimate of the slope in that final phase--though maybe it doesn't matter anyway).

But if you feel that your current model fits the data well enough for practical purposes, I would stick with it. All it really would need is a disclaimer that it would not be applicable beyond the time frame of the observed data (which is true of any model).
Comment
Martin Mueller

Join Date: Jun 2017

Posts: 77
#10

04 Jul 2017, 05:45

Dear Clyde,

thank you - once again - so much for your amazing posts! Some issues on your posts:

ad "case-effect":
I like to ensure that the correct command of an case-effect is - test (1.case#c.time_square 1.case#c.time).
The one you suggested (-test case#time...) lead into an error ("case#c.time_square not found"). I am confident it is right, but want to be sure.

ad "lincom-command":
Thank you so much for the lincom command which works perfect using the command: lincom 1.case + 24*1.case#c.time + (24*24)*1.case#c.time_sq
(Stata does not like "24^2"; the error message was a bit cryptic for a not daily users using "24^2" instead of "24*24": Error message: "not possible with test")

ad "graph":
I tried to correct all typos. I could find only one: an extra ")" in the line "foreach t of numlist 0(6)36 {".

Unfortunately, Stata runs into an error "BLUP calculation failed; estimation data have changed" in the loop starting with "forvalues c = 0/1.." you posted above and I could not solve the problem using your code, but it helped me getting an idea.
Using this lines:

Code:

mixed c.bmi i.case##c.(time time_sq) c.age i.sex i.type_surgery || pid: time time_sq, stddev base predict bmi_pred, fitted sort time by time: egen mean_bmi_cases_pred = mean(bmi_pred) if case ==1 by time: egen mean_bmi_controls_pred = mean(bmi_pred) if case ==0 twoway (line mean_bmi_controls_p time, connect(ascending) lc(blue)) (line mean_bmi_cases_p time, connect(ascending) lc(red))

I got the scatterplot. But I am not that confident that this workaround is correct (my feeling for correct workarounds seems to be quite awful).

Do you have any ideas where the mistake is in the code you posted or if my "workaround" is (surprisingly) correct this time?

ad "Your thoughts about my model":
Thank you so much for your thoughts and feedback on my model! I can follow your argumentation and advices perfectly! Your suggestion for a piecewise linear model (the course over time is a spline function then) sounds very interesting and models the known course of bmi in the literature perfectly - exactly like you have written in above, but as interesting it sounds, your thoughts about technical diffculties and interpretation problems especially for starters sounds as scary, too. :-)
Maybe I stick to the quadratic model. It models the known course ( quadratic shape) up to three years fine I guess.

And just once again: Your posts are really helpful!
Best input for a clinican like me trying to increase his statistical skills for a long time!

Best wishes
Martin
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#11

04 Jul 2017, 09:22

ad "case-effect":
I like to ensure that the correct command of an case-effect is - test (1.case#c.time_square 1.case#c.time).
The one you suggested (-test case#time...) lead into an error ("case#c.time_square not found"). I am confident it is right, but want to be sure.

Yes, that's correct.

Do you have any ideas where the mistake is in the code you posted or if my "workaround" is (surprisingly) correct this time?

Well, that error message is probably coming from -predict rr*, reffects-, and looking back over the code, I don't think that line is needed in any case. When I started writing it, I had forgotten that after -mixed-, -predict- has a -fitted- option. So my first go at it predicted -xb- and -reffects- separately and then combined them to calculate the fitted value. I then remembered the -fitted- option, and in the course of simplifying the code to use it, I apparently forgot to remove the -predict rr*, reffects- command.

I would try removing that line and then re-trying my code. If you still get that error message, then put

Code:

set tracedepth 1 set trace on

just at the top of the inner loop. You will get a lot of output, but it will at least show you which command inside the loop is causing the problem, and then we can try to figure it out from there.

As for your workaround, it does not do quite the same thing. Your workaround does not adjust the results in a way that accounts for differences in the distributions of sex and surgery_type between cases and controls, or changes in those distributions over time. If case vs control was a randomized assignment and your study is large enough for the central limit theorem to do its wonders, then the results wouldn't differ by much--perhaps not even by a visible amount. But if not, then the two curves your workaround plots will reflect, in part, differences in distribution of sex and surgery_type, not just the case effect and time effects.

The key difference between your workaround and my code is that you are calculating your results separately for cases and controls. Notice that in my code, each iteration through the two nested loops uses all the observations, both cases and controls, in its calculations. Depending on the values of c and t, it treats each observation as if it were a case (when c = 1, or as a control when c = 0) at a given time t, and gets the predicted values you would see if everybody in the study actually had that case/control status and were at that time. This way, the expected values calculated are all adjusted to the same distribution of sex and surgery_type: namely the observed distribution of those variables in the entire estimation sample. (This is how -margins- works. Were it not for the quadratic term in the random slopes part of the model, we could use -margins- directly. My code is just an emulation of that.)

Your posts are really helpful!
Best input for a clinican like me trying to increase his statistical skills for a long time!

Thank you very much.
1 like
Comment

Martin Mueller

Join Date: Jun 2017
Posts: 77

#12

04 Jul 2017, 17:16

Dear Clyde,

thank you very much.

1.)
The problem seems to be the "predict" command after the whole sample is set to the same case and time status (or changed in one variable).
If I comment out the replace case ... replace time .... replace timesq the code works (scatter gives a straight line horizonzal line as expected).

Code:

 //    GET A POSTFILE TO HOLD RESULTS FOR GRAPHS
. capture postutil clear
. tempfile tograph
. postfile handle byte case float time float fitted using `tograph'
.
. //    PRESERVE ORIGINAL VALUES OF KEY VARIABLES
. clonevar case_original = case
. clonevar time_original = time
(542 missing values generated)
. clonevar timesq_original = timesq
(542 missing values generated)
.
. //    EMULATE WHAT -MARGINS- DOES
. forvalues c = 0/1 {
  2.     foreach t of numlist 0(6)36 {
  3.                 set tracedepth 1
  4.                 set trace on
  5.                 replace case = `c'
  6.         replace time = `t'
  7.         replace timesq = `t'*`t'
  8.         predict fitted, fitted
  9.         summ fitted, meanonly
 10.         post handle (`c') (`t') (`r(mean)')
 11.         drop fitted
 12.         }
 13. }


- replace case = `c'
= replace case = 0
(1160 real changes made)
- replace time = `t'
= replace time = 0
(2439 real changes made)
- replace timesq = `t'*`t'
= replace timesq = 0*0
(2439 real changes made)
- predict fitted, fitted
  --------------------------------------------------------------------------------------------------------- begin predict ---
  - version 8.2, missing
  - if "`e(cmd)'" == "rocreg" & "`e(predict)'" == "" {
  = if "mixed" == "rocreg" & "mixed_p" == "" {
    di as err "predict not allowed after nonparametric ROC"
    exit 198
    }
  - if "`e(mi)'"!="" & "`e(b)'"!="matrix" {
  = if ""!="" & "matrix"!="matrix" {
    error 321
    }
  - if _caller()<=5 | "`e(predict)'"=="" {
  = if _caller()<=5 | "mixed_p"=="" {
    _predict `0'
    }
  - else {
  - local v : display string(_caller())
  - version `v', missing
  = version 13.1, missing
  - `e(predict)' `0'
  = mixed_p fitted, fitted
BLUP calculation failed; estimation data have changed
    }
  ----------------------------------------------------------------------------------------------------------- end predict ---
  summ fitted, meanonly
  post handle (`c') (`t') (`r(mean)')
  drop fitted
  }
  }
r(459);

end of do-file

I found a website where there is a post "A useful trick for exploring the consequences of a model is that predict does not require the data to be the same as the data used during estimation" (http://maartenbuis.nl/wp/inter_quadr/inter_quadr.html) - Thus, I don't see why the error message appeared

2.)
Making two things at a time (remembering me of clinical work...), I like to ask how to model the above mentioned (cubic) spline mixed effect model in stata.
I saw an older post of you here: https://www.statalist.org/forums/for...nuity-at-knots, but I could not figure it out really how to transfer it to my model or if its really the same.

My first naive, blind guess was that I have to use the mkspline command somehow similar to

mkspline timespl = time, cubic knots(6 12 24 36) (creates only zero variables)
and then maybe
mixed c.bmi i.case##c.timespl* c.age i.sex i.type_surgery || pid: time, stddev base,

but surprisingly it does not work. :-) Or is it much more complicated than that. A google search did not show any useful sites unfortunately.

Any thoughts about my issues?

Thank you!
Kind regards
Martin

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#13

04 Jul 2017, 17:56

OK. I think I know how to work around the -predict- problem, but I don't have a suitable data set handy to try it out on. So please post an example of your data for me to experiment with. (Use the -dataex- command; see FAQ #12 if you are not familiar with it.)

As for the cubic spline, I don't see anything wrong with your syntax. I've used syntax just like that fairly often and always gotten the needed variables. With regard to the -mixed- command, you don't say in what sense it doesn't "work." It looks syntactically correct. But it is a mis-specified model. Just as you must include random slopes on the quadratic terms for time, if you use a spline, you must include all of the time spline variables for random slopes. So it would look like this:

Code:

mixed bmi i.case##c.timespl* c.age i.sex i.type_surgery || pid: timespl*
Comment

Martin Mueller

Join Date: Jun 2017
Posts: 77

#14

05 Jul 2017, 02:26

Dear Clyde

I thought its not working because the mkspline command only produces zero vectors. But it turns out after running mixed it works.
Maybe your fake margin marginplot command will work for the graphical output,too?

See attached a fake dataset. Don't know if that is enough to play around. Let me know if you need more.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(time sex) double bmi float(case type_surgery timesq age)
30 1  33.59667203682475 0 0  900  40.69206
30 1                  . 0 0  900 32.788033
36 1 29.134841087460515 0 1 1296  45.23434
18 0 25.872660320764407 0 0  324  25.40038
24 0  22.32003763364628 0 0  576  31.26294
30 1                  . 0 1  900   44.2245
 0 0  42.21811418235302 0 1    0  48.40317
12 1 21.246004143031314 0 1  144  66.36579
36 1 29.917569989105687 0 1 1296  76.38893
24 0  36.03780333106406 0 1  576  29.67375
30 1                  . 0 1  900  51.09234
18 0  35.38010896090418 0 1  324  45.69635
30 0                  . 0 0  900  60.43335
 6 1  37.01888190838508 0 1   36  42.12396
30 0                  . 0 0  900   58.4866
36 1 39.266984128346664 0 1 1296 29.853605
18 1 38.895097464742136 0 1  324 11.173392
24 0  40.44387621260248 0 0  576 18.969889
 0 1  41.77331710327417 0 1    0  25.61165
30 0   33.2234626124613 0 0  900 18.977888
 0 1  70.74349452112801 0 1    0  35.89326
24 0  37.78121787454002 0 1  576 34.785397
30 1 38.770503374673424 0 0  900  15.00435
36 0 39.087136363470925 0 1 1296 17.661943
 6 0  55.29211731418036 0 1   36  44.04169
12 0 57.957448785053565 0 1  144  37.02226
18 1 47.305885058175775 0 1  324  47.07419
30 0  47.38815164929256 0 1  900 26.263924
36 1  51.18407642343082 0 0 1296  52.85219
 0 1  45.01971122128889 0 1    0  29.37575
24 0 30.331288445089015 0 1  576   46.3517
36 1 28.129403482424095 0 1 1296  43.34936
12 1   32.7947347721085 0 1  144  41.56652
24 1  34.22106263712048 0 1  576  41.63597
 0 1  44.85269301994704 0 1    0  49.49756
 6 0 39.756415348825975 0 1   36  50.85715
12 0 20.919013614486904 0 0  144 32.314503
24 1  29.77863027383573 0 0  576  29.09315
30 0 32.952078769821675 0 1  900  53.39727
 0 1   52.7217719622422 0 1    0 36.538155
 6 0                  . 0 1   36   23.9262
18 0                  . 0 1  324 27.074236
24 1  33.42261830563657 0 0  576  48.03236
36 1 33.505803146073596 0 1 1296   57.0962
30 0 31.318733864417297 0 1  900  46.50735
 0 0 58.240052175475284 0 0    0  22.68942
18 1  42.78355276226066 0 1  324 35.188362
36 1  45.20635781479068 0 1 1296  12.16433
 0 1  45.15412074676715 0 0    0  46.02279
12 1 34.793406513659285 0 1  144  60.02486
24 0 30.491825020639226 0 1  576  75.80413
30 0 31.755170467309654 0 0  900  77.51144
36 0 31.588659702939914 0 1 1296  55.98721
 6 0  32.86163203525357 0 1   36  26.20747
18 0 26.917731477878988 0 0  324  23.52655
36 0 29.676376982871446 0 1 1296  30.54711
12 0  33.89635044024326 0 1  144  55.07415
18 1  38.55181730161421 0 1  324  43.70815
24 1 37.842028146097434 0 1  576  32.79556
30 0    33.440515475953 0 1  900  44.53777
36 1 37.495686208829284 0 0 1296  51.98568
18 0 30.797498839069156 0 1  324  40.46217
36 1 34.961607184354214 0 0 1296  42.02447
24 1                  . 0 1  576 31.508215
30 0                  . 0 0  900  43.24716
36 1  34.41391650377773 0 0 1296  59.54494
 0 0  39.45931114703417 0 0    0 21.042213
 6 1 39.379544647829604 0 0   36  44.16336
18 1 30.775955859152603 0 0  324  22.42801
24 1  37.57931846915744 0 1  576  28.37788
24 0  39.80633567776531 0 0  576 15.811335
30 0                  . 0 1  900 -.5387288
 6 1 33.041529312869535 0 1   36  42.59256
12 0 29.247513240436092 0 1  144  35.45995
18 0 25.410081466333942 0 1  324  41.57531
24 0                  . 0 0  576  39.41916
30 1                  . 0 0  900  42.13078
 6 1   35.1190571537707 0 0   36  44.21338
12 1 20.603547780541703 0 0  144  55.74634
30 1  32.23354719160125 0 0  900  56.08887
36 0  29.49067072062753 0 0 1296  76.10957
 6 1  36.07276575034484 0 0   36 31.726765
12 0                  . 0 0  144  51.47675
24 1                  . 0 0  576 37.902584
30 0                  . 0 0  900  28.90389
36 0                  . 0 1 1296 28.562994
 6 0  36.48765929513611 0 1   36  63.04611
12 1 41.219086850527674 0 0  144  60.63246
24 0 45.630709814559665 0 0  576  56.77765
 0 0  44.07363574579358 0 0    0   41.8812
24 0 22.770334775047377 0 0  576  47.70867
30 1 28.892282480094583 0 1  900  37.71745
36 1 26.959640344791115 0 1 1296  42.46822
18 0  36.01342550218105 0 1  324  30.00846
30 1 29.846921959007158 0 1  900  44.34793
36 0  32.39810339999385 0 1 1296 34.835968
 0 0  71.28075210493989 0 1    0 17.452406
36 1   33.1081607255619 0 0 1296  32.72573
18 0   46.8228130929172 0 1  324  43.46414
24 1                  . 0 0  576 30.057096
end
label values sex sex_lab
label def sex_lab 0 "male", modify
label def sex_lab 1 "female", modify
label values case caseLab
label def caseLab 0 "control", modify
label values type_surgery type_surg_lab
label def type_surg_lab 0 "Banding", modify
label def type_surg_lab 1 "Sleeve", modify

Comment

Martin Mueller

Join Date: Jun 2017

Posts: 77
#15

05 Jul 2017, 06:05

Appendix:

One more question for the output of the cubic spline mixed effects model with knots basline (0 months), 6 months (initial phase), 12 months (bmi regain) and 36 months (linear constant/increase phase) after surgery to take into account the course of the bmi of the data:

Considering the output I posted above #14 was produced after

Code:

mkspline timespl = time, cubic knots(0 6 12 36) mixed bmi i.case##c.timespl* c.age i.sex i.type_surgery || pid: timespl*

1.) Is it right to say:
a) in the first 6 months after surgery there is a significant (case#timespl1: p=0.009) case-effect, i.e. cases have a higher bmi than controls which cannot be seen more time after surgery (p>0.05 in case#timespl2 etc.) suggesting that the initial phase is different between cases and controls but not further on.

OR

b) do i need - test timespl1 timespl2 timespl3 => p<0.0001)- to say that there is a time effect in the course of the bmi over time in the control group (p<0.0001) AND test (1.case#c.timespl1 1.case#c.timespl2 1.case#c.timespl3 => p=0.0015) to say there is a strong evidence for a difference (p=0.0015) in the course of the bmi over time between cases and controls.

If the latter (b)) is correct (what I think), is there a way to calculate the predicted difference of the bmi e.g. after 24 months with the lincom or predict command? And then the only ongoing challgenge (as you mentioned before) is to visualise the course of the predicted model.

2.) Is it a "bad" sign that for the random-effects Parameters the std.err and 95% CI cannot be calculated ? If I use the code above it cannot be calculated.

I really think I am learning a lot. Thank you!!
Comment

Announcement