Repeated measures analysis with xtmixed and lincom

Jem Lane

Join Date: Nov 2014

Posts: 60
#1

Repeated measures analysis with xtmixed and lincom

22 Dec 2014, 09:19

Hi,

I've been trying to use the lincom function after fitting the multilevel mixed effects model below. Briefly, OUT is the dependent variable, GRP (1/0) and CONCEN(0-3) are explanatory variables, and id is a level 2 variable (repeated measures being level 1). I have made repeated measures on each sample (id) at four concentrations/timepoints (CONCEN), and wish to know if the predicted value of OUT at CONCEN 3 is significantly different from baseline in GRP=1. This follows on from my previous post "Help with interpreting xtmixed output" of 28 Nov.
In reference to Clyde's answer re: 2 equation approach, I think I've got it, but just to be sure, I've referred to his original equations, and used an example with the slightly more complicated GRP=1 below:

Code:

xtmixed OUT i.GRP##c.CONCEN || id: CONCEN, mle variance ------------------------------------------------------------------------------ OUT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.GRP | 18.73228 8.295914 2.26 0.024 2.472583 34.99197 CONCEN | -1.495238 .5285768 -2.83 0.005 -2.53123 -.4592467 | GRP#c.CONCEN | 1 | 2.635238 .7285928 3.62 0.000 1.207222 4.063254 | _cons | 38.20106 6.018489 6.35 0.000 26.40504 49.99708 ------------------------------------------------------------------------------ Group 0: OUT = b0 + b1*CONC + error Group 1: OUT = c0 + c1*CONC + error b0 = _cons, b1 = _b[CONCEN] c0 = _cons + _b[1.GRP], c1 = _b[CONCEN] + _b[1.GRP#CONCEN]

So, if I've understood correctly, this is where the coefficients in your equations come from (colour-matched).

As an example for GRP1 between CONCEN3 and CONCEN0:

At 0
_cons + _b[1.GRP] + (_b[CONCEN] + _b[1.GRP#CONCEN])* CONCEN

At 3
_cons + _b[1.GRP] + (_b[CONCEN] + _b[1.GRP#CONCEN])* CONCEN

Difference:

= (_b[CONCEN] + _b[1.GRP#CONCEN])* CONCEN

= (_b[CONCEN] + _b[1.GRP#CONCEN])* 3

= (-1.495 + 2.635) * 3

= 3.42

which is the same as I get with (albeit without p-value and CIs)

Code:

. lincom 3*(CONCEN + 1.GRP#c.CONCEN) ( 1) 3*[OUT]CONCEN + 3*[OUT]1.GRP#c.CONCEN = 0 ------------------------------------------------------------------------------ OUT | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 3.42 1.504356 2.27 0.023 .4715167 6.368483 ------------------------------------------------------------------------------

And, as the p-value for this is 0.023, this suggests there is a significant difference between CONCEN 3 and 0 for GRP1, right?

thanks

Jem
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#2

22 Dec 2014, 14:31

Yes, I agree with what you have done above.
Comment

Jem Lane

Join Date: Nov 2014
Posts: 60

22 Dec 2014, 16:22

Excellent, thanks Clyde. Presumably this approach would also work if I wanted to include a quadratic term in the explanatory variables to more closely approximate the data, e.g. with

Code:

xtmixed OUT i.GRP##c.CONCEN##c.CONCEN || id: CONCEN, var

(I assume this is the best way to do it if you have two groups (i.e. GRP 0 or 1) whose curves diverge as I do, rather than simply including a (CONCEN)² standalone term, as in:

Code:

xtmixed OUT i.GRP##c.CONCEN CONCEN_2 || id: CONCEN, var

The interaction term in the first command is required to allow divergence of the curves based on the quadratic element, whereas the second will allow quadratic curves but of similar trajectories - is that right?

Assuming the first command is best, the output is:

Code:

xtmixed OUT i.GRP##c.CONCEN##c.CONCEN  || id: CONCEN, var

----------------------------------------------------------------------------------------
                   OUT |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------------+----------------------------------------------------------------
                1.GRP |   21.08528   8.402084     2.51   0.012     4.617499    37.55306
                CONCEN |  -1.115873   1.637795    -0.68   0.496    -4.325892    2.094146
                       |
         GRP#c.CONCEN |
                    1  |  -1.738413   2.257544    -0.77   0.441    -6.163118    2.686293
                       |
     c.CONCEN#c.CONCEN |  -.2121212   .3784672    -0.56   0.575    -.9539034    .5296609
                       |
GRP#c.CONCEN#c.CONCEN |
                    1  |   1.300433   .5216811     2.49   0.013     .2779568    2.322909
                       |
                 _cons |   38.06926   6.095513     6.25   0.000     26.12228    50.01625
----------------------------------------------------------------------------------------

So, to check for a significant difference between CONCEN = 0 and 3 for GRP1:

At CONCEN=0 and 3 (coefficients only written here for clarity)
OUT = _cons + 1.GRP + CONCEN + GRP#c.CONCEN + c.CONCEN#c.CONCEN + 1.GRP#c.CONCEN#c.CONCEN

Difference (now including actual value of CONCEN, i.e. 3) =
(CONCEN)*3 + (GRP#c.CONCEN)*3 + (c.CONCEN#c.CONCEN)*9 + (1.GRP#c.CONCEN#c.CONCEN)*9

Therefore, am I right in thinking command should be:

Code:

lincom 3*(CONCEN + GRP#c.CONCEN) + 9*(c.CONCEN#c.CONCEN + 1.GRP#c.CONCEN#c.CONCEN)

Have I understood correctly?

thanks,
Jem

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#4

22 Dec 2014, 17:18

Again, I agree with everything in your #3 post. And, yes, you probably should include both the linear and quadratic terms. If you don't, you are implicitly constraining the coefficient of the linear term to zero, which is equivalent to forcing a fit to a parabola whose vertex is at CONCEN = 0. Now, if theory says that CONCEN = 0 must be the location of the parabola's vertex, then, of course, you can and should omit the linear term.

One other point: polynomial laws are usually derived as simple forms to fit some data over a limited range. They seldom have theoretical justification, and usually don't generalize well. If you are interested in capturing non-linearity in your relationships, you can consider alternative representations of the relationship such as linear splines, or other functional forms that might be more naturalistic in your scientific domain. Since your actual CONCEN variable, if I recall from your earlier thread, takes on 6 discrete values, you could even make CONCEN a discrete variable.
Comment
Jem Lane

Join Date: Nov 2014

Posts: 60
#5

23 Dec 2014, 02:02

Great, thanks. Yes, I see what you're saying about polynomial laws and the lack of theoretical justification. I will review linear splines, but having tried out the double interaction including quadratic (as described above), I found this fitted my data remarkably well. As to whether to make CONCEN a discrete variable, would this not preclude it from being included in the random part of the model at the 'id' level? i.e. I would not be able to create a random slopes model?
Jem
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30194
#6

23 Dec 2014, 11:41

Well, in principle there is nothing that prevents you from having random coefficients for the CONCEN-indicator variables if you make CONCEN a discrete variable. That's what the R. notation is for. But, admittedly, a model like that would be very hard to understand and interpret, and probably also very lacking in any theoretical justification. So I probably wouldn't go there--when I suggested a discrete variable, I wasn't thinking about the random slopes part of the model.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 876
#7

23 Dec 2014, 16:41

Hi Jem,

Knowing that it is non-customary to reveal inbox messages, for your better scope of understanding, I reveal the text you messaged me as I don't want you to lose the opportunity of receiving valuable opinions from other good-heart-wise men here. Also I reveal it here in this thread as the question relates to the same model we are discussing here.

I believe you have read Richard William's document on "margin" according to my suggestion and your next query after reading the document was:

As far as I can see though, the actual Margin values, or dy/dx or Contrast values are just derived from the fixed part of the multilevel model, and you could easily compute these values yourself by feeding in the appropriate values of explanatory variables.

Any prediction is based on the fixed part unless you exclusively specify (we will see later). Note that the random error part contains the estimates of variance of unobserved variables whose nature/characteristics are unknown because we have not measured them. Therefore, the prediction including the those error will be a confounded prediction rather precise model based prediction (but someone may have valid reasons for predicting including the random error).

Regarding feeding "appropriate values of explanatory variables": That is right but depends how you feed in. Margin compute the marginal effects which is the average of the difference of predicted values for two hypothetical population. Say in your case, three steps of computing average marginal effect will be: i) compute the predicted value for observed values assuming if everyone belongs to group=0 ii) predicted value assuming if everyone belongs to group=1 iii) compute the difference between the two and the average of the difference will give you the average marginal effect. Now, you may argue with the assumptions i.e. considering group=0 as group=1 or vice versa, but in theory it is not counter intuitive. If you believe your model is giving you scientific truth, then you must believe that in the population if you re-sample again you will find the same difference and the difference will only be attributed to the different treats to different groups, thus group assignments. Therefore, even if you consider all your sample as group=0 and all as group=1, holding the effect of other variables constant, you still should be able to find the group difference if the truth (group difference is due to group allocation) is there.

So, it seems the main benefit of using Margins commands is that you get chi2 and p-values, enabling you to say whether the differences are significant, right?
And just coming back to the predict varname, fitted function - am I right in thinking this individualises the model to each sample by utilising the random part to obtain BLUPs?

Margins has lot more benefits, customized predictions, post estimation tests, tests of difference etc. depends what you want. Please do take time to read original margin manual, it is worth reading and will make your life easier if you continue in the research field.

And just coming back to the predict varname, fitted function - am I right in thinking this individualises the model to each sample by utilising the random part to obtain BLUPs?

You are right, BLUPs in "mixed" model is known as Emperical Bayes Prediction or sometimes Estimated Best Linear Unbiased Predictor (EBLUP). Which can be done in two ways. The straight forward is the one you mentioned and alternatively you could estimate the fixed part prediction and then predict the random part for each cluster and finally add them together which will give you the same as with "fitted" option:

Code:

predict fixed, /*obtain fixed prediction*/ predict re, reffects /*obtain random part for each cluster*/ gen eblups= fixed+re tabstat eblups,stat(mean)by(group) /*this should be similar to the following:*/ *********** predict varname, fitted tabstat varname, stat(mean)by(group)

Hope it helps,

Best

Roman
Comment
Jem Lane

Join Date: Nov 2014

Posts: 60
#8

23 Dec 2014, 17:35

Many thanks again Clyde. As before, your explanation using the equations really helped me to understand the xtmixed output, and importantly, to understand both how more complicated models work (e.g. with squared terms), and the use of Margins.

Roman, thank you for taking the time to explain further. I have looked again at the Margins section in the manual. You're right - it was too simplistic to say the main benefits of Margins are in terms of chi2 and p-values, though for my limited use of the function with my model, this is what I am most interested in. And thanks for explaining a bit more about predict varname, fitted and EBLUPs.
I am really just after a working knowledge here, to give me some concept of what I am doing with the post estimation commands, and you have helped clarify these things.

Season's greetings to you both.

Jem
Comment

Announcement