Log transform for the normality of the residuals of linear mixed effect models

Caitlin Loh

Join Date: Jul 2025

Posts: 3
#1

Log transform for the normality of the residuals of linear mixed effect models

16 Jul 2025, 01:02

Hi,

I would like to get some advice on the steps for checking the normality of the residuals for my linear mixed effect models.

The linear mixed effect models I came up will look at the effect of genetics on BP, effects of diet on BP and the effects of genetics and diet on BP.

However, I got stuck at one of the models as it shows the normality of the residuals were significant. Hence, I log transformed the outcome as well as the predictor in that model. After log transformed both the variables, the normality of the residuals still show significant. I also conducted square root and robust test as suggested by Copilot, but they show significant. Could you please give me some advice on how I should go about this?

I also have another question regarding log transform for the models. For another model I've conducted, I have log transformed the predictor instead of the outcome in that model. Does that mean the log transformed predictor should be the same in every other models that I am conducting or it's only applicable to that one model?

Please let me know if you need more details regarding these models. Looking forward to hearing from you!

Thanks,
Cait
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#2

16 Jul 2025, 02:47

Models are by definition simplified versions of reality. Simplification is just another word for "wrong in some useful way". So models are by definition wrong (but some of them are useful). So when diagnosing a model you do not want to know if the assumptions are true: The assumptions are what makes the model a simplification. So assumptions are never true and are supposed to be never true. If the assumptions are true, then the model no longer simplifies reality and thus ceases to be a model. The question you are asking when diagnosing models is if the assumptions are close enough such that the model is useful. Significance testing cannot answer that question. So you can not use statistical tests as "There is a significant deviation from the assumption, therefor I need to change the model". The (limited) value of statistical tests is that it can guard us against the human "skill" of seeing patterns in random noise (think of the ink-blot or Rorschach tests https://en.wikipedia.org/wiki/Rorschach_test ). So a significant test result is the beginning of your analysis not the end: you can now start looking at the residuals and start the decision process on whether the deviations are too much or not. So you look at your model, figure out how robust it is against the deviations you observed, and you make a decision.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
2 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#3

16 Jul 2025, 08:13

#1 is hard to follow in detail but what I think is being said is that for each model tried a Shapiro-Wilk test rejected a hypothesis test of normally distributed residuals. A common problem with such tests is that rejection at conventional significance levels can just mean that the sample size is large enough to detect non-normality that may not matter. True or not, like many other people I find normal quantile plots immensely more helpful than any such test. qnorm is a dedicated official command.

The deeper and more general point is precisely that raised by Maarten Buis Realisation that a model is not ideal in any sense doesn't help much (if at all) in identifying any model that is better.

Here BP perhaps means systolic blood pressure and if so it does seem possible to me that working on a logarithmic scale might here.

Your various predictors are to do with diet and genetics. I can't see any detail on which of those may benefit from logarithmic scale.
1 like
Comment
Caitlin Loh

Join Date: Jul 2025

Posts: 3
#4

17 Jul 2025, 05:21

Hi All,

Thanks for the quick answers and the explanations. They were a big help.

I was also recommended to conduct least square means after conducting linear mixed effect models. However, I have no idea how to conduct this test after the models. All I know is that the stata code to conduct the least square means is "margins". Could you help me with this as well?

Below I am attaching an example of my stata code for one of the mixed effect models:
mixed diastolic_b_mean c.adjusted_dietary_sodium i.treatment_num i.period i.sequence_num i.trial_num i.sequence_num#i.trial_num i.treatment_num#i.trial_num i.trial_num#i.period || study_id:, reml

To conduct the least square means, is this right?:
margins diastolic_b_mean c.adjusted_dietary_sodium i.treatment_num i.period i.sequence_num i.trial_num i.sequence_num#i.trial_num i.treatment_num#i.trial_num i.trial_num#i.period || study_id:, reml

Thanks,
Cait
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3467
#5

17 Jul 2025, 09:19

Originally posted by Caitlin Loh View Post

I was also recommended to conduct least square means after conducting linear mixed effect models.

I have no idea what that means. "Least square means" sounds silly to me: means by definition minimize the sum of squared errors, so the least squares part of least squares means just adds nothing. So all we have is means, whatever that may mean in this context. Maybe the person you talked to meant conditional means? This is getting too speculative for me to give you any meaningful advise. The best advise I can give you is to ask the person that gave you that advise to clarify what they meant.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
George Ford

Join Date: Aug 2014

Posts: 3187
#6

17 Jul 2025, 09:36

Don't worry about the normality. That topic has been well covered here. Not sure why that is so heavily taught.

Here are the results from a simple simulation. 100 obs, 10,000 simulations. Despite the residual being highly non-normal (rejected by SW 100%), the estimates are the same. Normality is not required for BLUE, just exact testing, but that's a high bar not available for most models and data.

Code:

RESIDUAL PROPERTIES | RESULTS Type Mean Skew Kurt SW Rej Beta SE Rej b=1 Normal -0.000 -0.000 2.939 0.049 1.001 0.045434 0.053 Non-Normal -0.000 2.403 10.479 1.000 1.000 0.045429 0.053
2 likes
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1139
#7

17 Jul 2025, 09:38

I suspect the person requesting "least squares means" was a SAS user.
https://documentation.sas.com/doc/en...m_syntax10.htm

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35782
#8

17 Jul 2025, 10:34

https://stats.stackexchange.com/ques...tical-nonsense is also helpful (despite the unpromising title).
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 751
#9

17 Jul 2025, 10:44

There is also a Stata documentation, I wonder if anyone has posted this before: https://www.stata.com/support/faqs/s...squares-means/

Best wishes

Stata 18.0 MP | ORCID | Google Scholar
2 likes
Comment
Caitlin Loh

Join Date: Jul 2025

Posts: 3
#10

21 Jul 2025, 19:11

Hi All,

Sorry for getting back this late.

Thank you so much for all the resources sent! They have been very helpful and I managed to understand and conduct all the analyses for my study.

Thanks,
Cait
Comment

Announcement

Log transform for the normality of the residuals of linear mixed effect models

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment