LR test for comparison of multilevel mixed effects models (xtmixed)

Jem Lane

Join Date: Nov 2014
Posts: 60

LR test for comparison of multilevel mixed effects models (xtmixed)

15 Jan 2015, 03:56

Hi,

I am using Stata 12IC for Windows. I have been trying to model some repeated measures data using multilevel mixed effects models with xtmixed. Briefly, 'cv' is the dependent variable, and 'concen' (0,1,2,3) is the explanatory variable. 'id' represents the sample undergoing repeated measures (level 2 variable - observations being level 1). So, in short I have made repeated measures of 'cv' on each sample (id) at four concentrations/timepoints 'concen'.
Further to the helpful comments I have received previously, I have been trying to model with a random slopes model, with and without an interaction term for 'concen', i.e. a quadratic component. Please see below the model specification for (1) the linear random slopes model, and (2) the quadratic random slopes model. I hope the graphs are displayed - as you can see, the marginsplot for model 2 (quadratic) fits the mean values from the original data more closely. However, the LR test is not significant.

Having read the manual entry for lrtest, my three questions are:
1. For my two models, would model 1 be considered restricted/constrained, and model 2 be unrestricted/unconstrained?
2. Does the p-value from LR test for these two models only relate to the null hypothesis that c.concen#c.concen = 0?
3. As the LR test is not significant, does this mean I should reject model 2 (quadratic), even though the marginsplot fits the original data better?

thanks

Jem

Code:

MODEL 1

. xtmixed cv c.concen || id: concen, mle variance cov(uns)

                                                Wald chi2(1)       =      0.52
Log likelihood = -121.79785                     Prob > chi2        =    0.4718

------------------------------------------------------------------------------
          cv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      concen |        1.7   2.362601     0.72   0.472    -2.930612    6.330612
       _cons |       68.7   12.40436     5.54   0.000      44.3879     93.0121
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Unstructured             |
                 var(concen) |   5.463436          .             .           .
                  var(_cons) |   959.4433          .             .           .
           cov(concen,_cons) |  -72.40067          .             .           .
-----------------------------+------------------------------------------------
               var(Residual) |   168.0487          .             .           .
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(3) =    27.23   Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.


MODEL 2

. xtmixed cv c.concen##c.concen || id: concen, mle variance cov(uns)

                                                Wald chi2(2)       =      4.25
Log likelihood = -120.11398                     Prob > chi2        =    0.1197

-----------------------------------------------------------------------------------
               cv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
           concen |  -11.26429   7.133662    -1.58   0.114    -25.24601    2.717435
                  |
c.concen#c.concen |   4.321429   2.261078     1.91   0.056    -.1102031     8.75306
                  |
            _cons |   73.02143    12.5543     5.82   0.000     48.41546     97.6274
-----------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Unstructured             |
                 var(concen) |   5.508205   10.02448      .1555584    195.0414
                  var(_cons) |   967.2806   569.4238      305.1144    3066.495
           cov(concen,_cons) |  -72.99301   80.92262     -231.5984    85.61241
-----------------------------+------------------------------------------------
               var(Residual) |   143.1493   44.17727      78.18078    262.1068
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(3) =    30.03   Prob > chi2 = 0.0000


. lrtest A B

Likelihood-ratio test                                 LR chi2(5)  =      3.37
(Assumption: A nested in B)                           Prob > chi2 =    0.6435

MODEL 1

MODEL2

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

15 Jan 2015, 04:22

Jem:
about the B model the quadratic term is not significant (as the linear one, indeed).
Hence, I fail to get the difference (in informative terms) between these two models.
At the top of that, Prob>chi2 is quite poor for both your models.
Unfortunately, graphs are not readable, so I cannot comment on -margins- results.
Besides, I would guess that nesting has to do with the levels of the model (pupils nested within classes and classes nested within schools, and so forth) and not with including or not a quadratic term.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

15 Jan 2015, 04:22

Hi, Jam,

Unfortunately, I cannot see the pictures, only the output.

To my (very modest) knowledge: a) the Wald "omnibus" test is directly related to the significance of the fixed effects (with the exclusion of the intercept); b) the LR test you get from each model is also a "omnibus" test, but here fundamentally for the covariance parameters and, as it is stated in the output, it tests the mixed model versus a standard linear regression; c) finally, the LR test done after "estimates store" from two given models (say, A and B), tests if, say, there is a difference between a relatively more complex model (here, "B") compared to a simplified model (here, "A", nested in B).

I won't dare to go further on this, but only for a few extra details, eventually. Perhaps in the years to come, having learned and practiced much more.

Hopefully that might be of some help.

Best,

Marcos

Last edited by Marcos Almeida; 15 Jan 2015, 04:28.

Best regards,

Marcos
Comment
Jem Lane

Join Date: Nov 2014

Posts: 60
#4

15 Jan 2015, 06:39

Thanks Carlo and Marcos. I have tried a different way of attaching the images - hopefully you can see them this time.
Carlo, I agree - neither the linear nor the quadratic terms are significant. But 'concen' is the sole explanatory variable, so I can't remove it. Do you mean that as the linear term in Model 1 is not significant, I should not have gone to a more complicated model with linear and quadratic terms (Model 2)? If so, interestingly the P>|z| for 'concen' is lower in Model 2, and the 'c.concen#c.concen' term approaches significance.
Also, you mention that Prob>chi2 is quite poor for both models - are you referring to those values above the fixed effects tables (0.4718 for Model 1, and 0.1197 for Model 2)? I have not been able to find what these values refer to. Are they as Marcos suggests, the p-values for the Wald omnibus test for significance of fixed effects? And if so, as it is lower (though still >0.05) for Model 2, does this suggest Model 2 is an improvement?

Model 1 Model 2
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

15 Jan 2015, 07:06

Jem:
as far the significance is concerned, you do not specify how many obesrvations you have. It may well be that a true difference doesn't exist or that your sample is too limited for reaching a statistical significance . However, I remind myself first that "absence of evidence is not evidence of absence" (please, see http://www.bmj.com/content/311/7003/485),
Yes, you're right, I meant that Model #2 is more demanding but does not represent a substantive improvement vs Model 1 (I would not consider the approaching significance of the squared terms in Model 2 relevant).
I would not consider a reduced Prob>chi2 for Model 2 as evidence of anything else that both your models are not better than intercept-only models.
Unfortunately, I cannot see your graps.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

15 Jan 2015, 08:35

Just to pile on here, looking at the graphs in #4, while it is true that Model 2 is a somewhat better fit to the data, you can see that the confidence intervals around the model estimates are very wide: you could fit a lot of very different models that way, and there is nothing to say one is better than another. Whether the wide confidence intervals reflect a small sample size or high variability in your cv measure I cannot discern from the available information. Either (or both) is possible.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

15 Jan 2015, 09:18

Jem:
I can see the graphs now and second Clyde's diagnosis in full.
Just out of curiosity (and not hoping for the better too much), I would take a look at -help anova- and related entry as far as repeated measures are concerned.

Kind regards,
Carlo
(Stata 19.0)
Comment

Jem Lane

Join Date: Nov 2014
Posts: 60

15 Jan 2015, 10:06

Hi Carlo and Clyde,

thanks for your responses. There were 7 samples, i.e. 7 separate 'id's, contributing to the large confidence intervals (cv values are also quite variable).

I have tried running an intercept only model (Model 3), and got the following results:

Code:

. Model 3

xtmixed cv c.concen || id:, mle variance cov(uns)

te: single-variable random-effects specification in id equation; 
covariance structure set to identity

                                                Wald chi2(1)       =      0.57
Log likelihood = -122.34038                     Prob > chi2        =    0.4503

------------------------------------------------------------------------------
          cv |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      concen |        1.7   2.251862     0.75   0.450    -2.713569    6.113569
       _cons |       68.7   11.18793     6.14   0.000     46.77206    90.62794
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   751.9519   425.8724      247.8032    2281.777
-----------------------------+------------------------------------------------
               var(Residual) |   177.4809   54.77179      96.93155    324.9661
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    26.15 Prob >= chibar2 = 0.0000


. lrtest C B   /// Model 3 vs Model 2

Likelihood-ratio test                                 LR chi2(3)  =      4.45
(Assumption: C nested in B)                           Prob > chi2 =    0.2165

. lrtest C A   /// Model 3 vs Model 1
Mixed models are not nested

Given the lack of significance for 'concen' or 'c.concen#c.concen' in any of the models (rand slopes (Model 1/A), or rand slopes with quadratic term(Model 2/B), and rand intercept only (Model 3/C)), is there any way to guide which one I should choose? My inclination would be to go with the model whose Margins most closely fit the original data mean values (i.e. Model 2), although -lrtest- is not significant when comparing Model 2 vs 1 (or 3 vs 2!). Would that be wrong? Is -lrtest- simply testing the null hypothesis that the coefficient of the additional term in the model is 0, or is there more to it?

thanks

Jem

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#9

15 Jan 2015, 10:28

Jem:
although Model 2 seems to be "better" than 1 and 3, it is still far from being informative, probably because you have a too limited sample (or a true difference does not exist).
Hence, in my humble opinion, the question is not mainly statistical but practical: what are you going to do with these analyses?
If the mixed models were set up for practicing on this topic or for discussing with your colleagues/teacher/supervisor the possible causes for their "poor" performance you can comment all of them.
Things are probably different if you're engaged in a research project (I find difficult to believe that a research project considers one predictor only, though): in this latter scenario, something was probably mistaken in the statistical plan and it would be mandatory to collect more data (if still feasible), unless the lack of significance was expected as a possible result of your statistical analysis (for instance due to the limited sample size).

Last edited by Carlo Lazzaro; 15 Jan 2015, 11:24.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#10

15 Jan 2015, 11:07

Hello Jem,

I'm afraid there is no need to add more suggestions after Carlo's and Clyde's. They summed up the situation, so to speak.

But I wish to say that now I can see the graphs. And that's the positive point.

On the hand, and now trying to say in other words what I indirectly reported in #3 about the tests, I gather your output strongly suggests that: the fixed effecs model (you showed only one variable) has no statistical significance; complexity in terms of adding up resources wasn't helpful in this case; apparently, the covariance parameters improved significantly in the mixed model, if compared to a linear regression (not a surprise, really).

On top of that, the above-mentioned "situation" of very large CIs pointed by Clyde and the paucity of variables underlined by Carlo.

If I didn't misunderstood the problem, most aspects point to a pessimistic scenario, I'm sorry to say.

Best,

Marcos

Best regards,

Marcos
Comment
Jem Lane

Join Date: Nov 2014

Posts: 60
#11

15 Jan 2015, 12:42

Dear Carlo and Marcos,

Thanks. The data are real - in my line of work in biological science, it is not uncommon to have small sample sizes like this. Admittedly, this is far from ideal, and I accept that the conclusions that can be drawn after modelling are likely to be limited. The experiments are measuring the response (cv) in tissue samples (id) at different concentrations of a drug (concen) - so yes, just one explanatory variable. But I agree Carlo, there is a practical issue here as much as a statistical one.
As a junior researcher, when I began the project my main concern was with setting up the experiments and ensuring I obtained data, rather than how to analyse it. Once I had the data, I was faced with having to find the best means of analysing it - some form of analysis is required, and each has its pros and cons. From what I have read about repeated measures analysis, either with repeated measure ANOVA or multilevel mixed effects models, both are likely to have the same weaknesses with small sample sizes. On balance though, having read quite a bit and spoken to statisticians about multilevel models, these offer several advantages and are preferred. Particularly as in another analysis I am comparing repeated measures on two groups (my previous posts). Finally, while the small sample size limits the conclusions that can be drawn from the model (whichever is chosen), the process of understanding the steps involved, and how models are built and refined, will hopefully be of value for future work involving larger sample sizes, hence my desire to understand as fully as possible. If you can recommend a better approach to analyse these repeated measures data, to see if there are changes in ‘cv’ with different values of ‘concen’, please let me know.

thanks

Jem
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#12

15 Jan 2015, 16:31

Well, I think we can summarize your situation as being the absence of evidence of an effect of CONCEN on cv. That is not, of course, the same thing as evidence of absence of an effect. A more detailed way of saying it is that the amount of variation in the cv measurement is too large, compared to whatever effect CONCEN might have, to draw any conclusion about what form the CONCEN-cv relationship takes, or even if there is any relationship at all.

I do not think there is any other approach that will lead you to a different conclusion from this data.

And I sympathize about the small N. Although I am an epidemiologist and most of my work involves studies with hundreds or thousands of subjects, I have done some work with biomedical researchers as well, and I know that the expense and effort that goes into each sample can be enormous, and small n's are very much the rule.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#13

15 Jan 2015, 22:51

Jem:
as Clyde says, collecting real-world data is (very) costly and requests a lot of red tape work, too.
The only remark that I would address to your last post is that collecting the data and then decide which statistical analysis should be performed is not the right way to approach a quantitative study.
A statistical plan conceived ahead of the study should detail what we're going to do after the data have been collected (in terms of effect to be measured, statistical analyses, type I and II errors, missing value repalcement and so on), instead.
However, small samples are often a matter of fact and the absence of evidence of the measured effect is their logical consequence.

Last edited by Carlo Lazzaro; 15 Jan 2015, 23:50.

Kind regards,
Carlo
(Stata 19.0)
Comment
Jem Lane

Join Date: Nov 2014

Posts: 60
#14

16 Jan 2015, 04:06

Dear Clyde and Carlo,

Once again, many thanks - I am most grateful for your inputs. I take on board everything you have said, and were it not for time constraints, I would endeavour to increase my sample sizes.
As you say Carlo, forward planning about statistical analysis is important, and something I am in a better position to do now.
To wrap this up, so that I know the right way to proceed for future analyses (with larger n’s!), it would be really helpful if you could let me know whether I have understood the following three aspects of multilevel mixed effects models:

1. The Prob > chi2 value on top of the fixed effects table of xtmixed output (just beneath Wald chi2) is, as Marcos suggested, an omnibus test for the coefficients of the fixed effects.
2. -lrtest- compares two models using their log likelihoods, to produce a p-value, which relates to the null hypothesis that the coefficient(s) of the additional explanatory variable(s) in the more complex of the two models is/are 0? So if p<0.05, it suggests the new coefficients are not equal to 0.
3. You should only move from one model to a more complex one (e.g. involving an extra explanatory variable) if the -lrtest- for the two produces p<0.05.

Jem
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#15

16 Jan 2015, 07:21

Jem:
Q1) Prob>chi2 focus on the fixed part of -xtmixed- and conveys the same information as in -regress- (i.e., echoing Marcos' explanation said, assuming the usual significance level, if it is>0.05, you cannot rule out that all your coefficientts=0; hence, your model is not different from an intercept-only model) as you can see from the following code in Stata 12.1/SE:

Code:

use http://www.stata-press.com/data/r12/pig.dta reg week weight, cluster(id) xtmixed week weight, vce(cluster id)

Q2) The LR test at the foot of the table reporting the random part of your model tests -xtmixed- vs -regress-; if it fails to reach statistical significance, -xtmixed- is a sort of "too much ado for nothing" procedure.
The LRtest that compares two nested models (let's say: Model 1 (1 level) a random sample of 8-aged pupils and their marks in maths during the last year; Model 2: (2 level) Model 1 nested in a random sample of classrooms of the same school, you can expand the number of levels further, but the higher the number, the trickier the explanation of your results, as it always happens to be) tells you, if statistical significant, that Model 2 explains your depvar better than Model 1. Hence, when it comes to -xtmixed- it might not only be a matter of more predictors, but also of more levels.
Q3) Correct.

As a non substantive closing remark, -xtmixed- command has been superseeded by -mixed- from Stata 13.1

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

LR test for comparison of multilevel mixed effects models (xtmixed)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment