Testing nonlinearity in a MLM with xtmixed

Lee Star

Join Date: Mar 2016

Posts: 8
#1

Testing nonlinearity in a MLM with xtmixed

22 Mar 2016, 17:12

I have been searching high and low for a straightforward answer to this question: I am using xtmixed to run a three-level regression. I suspect my data might be non-linear. How can I test whether there is a non-linear effect in this case? If there IS a non-linear effect, how do I proceed?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#2

22 Mar 2016, 17:45

"I suspect my data might be non-linear" can mean different things, that would call for different approaches. Tell us more and give a good description of the data itself as well.
Comment
Lee Star

Join Date: Mar 2016

Posts: 8
#3

23 Mar 2016, 10:32

Thank you for your response. There are theoretical reasons to believe that the relationship might be non-linear: I am testing whether exposure to violence (0-11 continuous scale, grand-mean centered in the model) predicts mental health problems and victimization by peers (each are 0-3 continuous scale). For example, above certain threshholds of violence, there may be a plateau in mental health problems. I would like to test whether this is the case with my data.

Additionally, my data is quite censored (Most students report 0-2 acts of violence). A residual plot shows sharp curves at either tail for my two outcome measures, which are both on a 0-3 scale (mental health problems and peer victimization). The qqplots of each outcome are attached. Log-transforming the violence measure does not improve the residual plots. I understand that I can use the robust estimator to adjust the SEs, but I am mostly concerned with testing the linearity of the relationship for theoretical reasons.

2 Photos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#4

23 Mar 2016, 10:59

Well, even aside from considerations of non-linearity, I have questions about your mental health problems and victimization variables on a 0-3 continuous scale. Really continuous? As in there can be values like 0.17, and 2.43, etc.? If your outcome variables are actually just 0, 1, 2, 3 and you are treating them as if they were continuous, then it wouldn't surprise me all that match to see poor fit, and I would recommend using a different approach altogether: ordered logistic regression (-ologit- command).

If these are truly continuous variables, the plots you are showing are not really informative either way. What they show are departures of the residual distribution from normality--but that is a separate issue from whether the linear model is a mis-specification. What would be to the point is graphical exploration of the relationships between those outcomes and the violence variable (scatterplots or lowess). Or, if the model is complicated and contains numerous covariates that may obscure the situation, plots of residuals vs estimated and residuals vs predictor. (See -help rvfplot- and -help rvppplot-.) The patterns seen in such graphs might suggest appropriate variable transformations, or they might be sufficiently linear that you feel more confident in the model as is.
Comment
Lee Star

Join Date: Mar 2016

Posts: 8
#5

23 Mar 2016, 11:36

The outcome variables are composed of averages, and so there are values such as .083, .727, 2.91, etc.

There are, indeed, covariates in the model. the two models are essentially:

xtmixed MentalHealth violenceGMC c.level2cov i.level3cov1 i.level3cov2 || cluster || school:, reml var
xtmixed PeerVictimization violenceGMC c.level2cov i.level3cov1 i.level3cov2 || cluster || school:, reml var

It was my understanding that rvfplot and rvpplot cannot be used with xtmixed? Thus, I generated the attached graphs of standardized residuals vs. predicted:
predict stand, rstandard
predict fit, fitted
twoway (scatter stand fit), yline(0)

and a graph of standardized residuals vs estimated:
predict stand, rstandard
predict est
twoway (scatter stand est), yline(0)

Last edited by Lee Star; 23 Mar 2016, 11:39.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#6

23 Mar 2016, 11:40

You are correct that rvfplot and rvpplot can only be used after -regress-. I forgot that you said you were using -xtmixed-.

There was no graph attached to post #5, so I can't say anything more about that.

By the way, in current Stata, the name for -xtmixed- has been changed to -mixed-.
Comment
Lee Star

Join Date: Mar 2016

Posts: 8
#7

23 Mar 2016, 11:43

Apologies, I accidentally posted it without attaching, and you are too fast for me! Attaching them again.

4 Photos
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#8

23 Mar 2016, 12:06

Nothing in these graphs leads me to worry about the appropriateness of a linear specification.

The plots all demonstrate a floor and ceiling effect from your outcome variable. If this were non-hierarchical data I might look into using a -tobit- model; and if there are only two levels, -xttobit-. I'm not aware of -tobit-like analysis for 3 or more levels, though.

The two residual vs fitted plots show a coarseness of the predictor variables and outcome, but not enough, in my view, to worry about.
Comment
Lee Star

Join Date: Mar 2016

Posts: 8
#9

23 Mar 2016, 12:10

THANK YOU! So is it safe to use these graphs to state that there is not evidence of linear misspecification and thus no reason to believe that the relationship between violence and the outcomes is quadratic (or otherwise)?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#10

23 Mar 2016, 12:35

Yes, the graphs do not provide evidence to support the use of a quadratic term.

Another approach you can use to bolster your confidence in the linear model is to run a new model with a quadratic term and see if you get a meaningful quadratic coefficient. I'm not talking about statistically significant here: from the density of points on the graphs you showed it looks like you have a very large sample, so small meaningless effects can be "statistically significant." Rather I would just look at whether the magnitude of the quadratic term coefficient is large enough to matter in a practical sense. If it's not, you're done. If it is, there's still another step: locate the vertex of the parabola (at x = -b[linear_term]/(2*_b[quadratic_term]). If the vertex is located safely beyond the range of values of the predictor (on either side), then it suggests some mild curvilinearity is suggested but might well be ignorable.

For several reasons, if you decide to go down that road, I would center the predictor variables and use the centered versions for this analysis.
Comment
Lee Star

Join Date: Mar 2016

Posts: 8
#11

23 Mar 2016, 12:51

The coefficient of a quadratic term (which was then grand-mean centered) is very small (b = .0004), and so I feel confident that the relationship is linear. I really, really, genuinely appreciate you taking the time to help me.
Comment
Lee Star

Join Date: Mar 2016

Posts: 8
#12

23 Mar 2016, 13:22

Just one more question, if you will: Can you please conceptually explain the difference between the "predict est" and "predict fitted" commands? I can only find that est produces the "linear prediction from fitted model" and fitted produces "fitted values from regression" and the difference between these is not clear to me.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#13

23 Mar 2016, 13:30

So your model is y = Xb + u + e, where Xb is the linear predictor (sum of coefficients * variables), u is the higher level random effect, and e is the observation level residual. (There may be more levels in your actual model, with multiple higher level random effects u, v, w, ... etc.)

The command -predict est- with no options specified is taken by default to mean j-predict est, xb-, which gives you the value of Xb (the raw linear predictor, with no random effects) for each observation. The command -predict fit, fitted- gives you Xb + u, which includes the random effects (but not the observation level residual).

I agree that the language in the help file and manual are not as clear as they could be on this.
1 like
Comment
Lee Star

Join Date: Mar 2016

Posts: 8
#14

23 Mar 2016, 13:39

Thank you again!
Comment

Announcement

Testing nonlinearity in a MLM with xtmixed

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment