Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log transform for the normality of the residuals of linear mixed effect models

    Hi,

    I would like to get some advice on the steps for checking the normality of the residuals for my linear mixed effect models.

    The linear mixed effect models I came up will look at the effect of genetics on BP, effects of diet on BP and the effects of genetics and diet on BP.

    However, I got stuck at one of the models as it shows the normality of the residuals were significant. Hence, I log transformed the outcome as well as the predictor in that model. After log transformed both the variables, the normality of the residuals still show significant. I also conducted square root and robust test as suggested by Copilot, but they show significant. Could you please give me some advice on how I should go about this?

    I also have another question regarding log transform for the models. For another model I've conducted, I have log transformed the predictor instead of the outcome in that model. Does that mean the log transformed predictor should be the same in every other models that I am conducting or it's only applicable to that one model?

    Please let me know if you need more details regarding these models. Looking forward to hearing from you!

    Thanks,
    Cait

  • #2
    Models are by definition simplified versions of reality. Simplification is just another word for "wrong in some useful way". So models are by definition wrong (but some of them are useful). So when diagnosing a model you do not want to know if the assumptions are true: The assumptions are what makes the model a simplification. So assumptions are never true and are supposed to be never true. If the assumptions are true, then the model no longer simplifies reality and thus ceases to be a model. The question you are asking when diagnosing models is if the assumptions are close enough such that the model is useful. Significance testing cannot answer that question. So you can not use statistical tests as "There is a significant deviation from the assumption, therefor I need to change the model". The (limited) value of statistical tests is that it can guard us against the human "skill" of seeing patterns in random noise (think of the ink-blot or Rorschach tests https://en.wikipedia.org/wiki/Rorschach_test ). So a significant test result is the beginning of your analysis not the end: you can now start looking at the residuals and start the decision process on whether the deviations are too much or not. So you look at your model, figure out how robust it is against the deviations you observed, and you make a decision.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      #1 is hard to follow in detail but what I think is being said is that for each model tried a Shapiro-Wilk test rejected a hypothesis test of normally distributed residuals. A common problem with such tests is that rejection at conventional significance levels can just mean that the sample size is large enough to detect non-normality that may not matter. True or not, like many other people I find normal quantile plots immensely more helpful than any such test. qnorm is a dedicated official command.

      The deeper and more general point is precisely that raised by Maarten Buis Realisation that a model is not ideal in any sense doesn't help much (if at all) in identifying any model that is better.

      Here BP perhaps means systolic blood pressure and if so it does seem possible to me that working on a logarithmic scale might here.

      Your various predictors are to do with diet and genetics. I can't see any detail on which of those may benefit from logarithmic scale.

      Comment


      • #4
        Hi All,

        Thanks for the quick answers and the explanations. They were a big help.

        I was also recommended to conduct least square means after conducting linear mixed effect models. However, I have no idea how to conduct this test after the models. All I know is that the stata code to conduct the least square means is "margins". Could you help me with this as well?

        Below I am attaching an example of my stata code for one of the mixed effect models:
        mixed diastolic_b_mean c.adjusted_dietary_sodium i.treatment_num i.period i.sequence_num i.trial_num i.sequence_num#i.trial_num i.treatment_num#i.trial_num i.trial_num#i.period || study_id:, reml

        To conduct the least square means, is this right?:
        margins diastolic_b_mean c.adjusted_dietary_sodium i.treatment_num i.period i.sequence_num i.trial_num i.sequence_num#i.trial_num i.treatment_num#i.trial_num i.trial_num#i.period || study_id:, reml

        Thanks,
        Cait

        Comment


        • #5
          Originally posted by Caitlin Loh View Post
          I was also recommended to conduct least square means after conducting linear mixed effect models.
          I have no idea what that means. "Least square means" sounds silly to me: means by definition minimize the sum of squared errors, so the least squares part of least squares means just adds nothing. So all we have is means, whatever that may mean in this context. Maybe the person you talked to meant conditional means? This is getting too speculative for me to give you any meaningful advise. The best advise I can give you is to ask the person that gave you that advise to clarify what they meant.

          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Don't worry about the normality. That topic has been well covered here. Not sure why that is so heavily taught.

            Here are the results from a simple simulation. 100 obs, 10,000 simulations. Despite the residual being highly non-normal (rejected by SW 100%), the estimates are the same. Normality is not required for BLUE, just exact testing, but that's a high bar not available for most models and data.

            Code:
                                        
                                   RESIDUAL PROPERTIES        |            RESULTS
                                        
            Type           Mean     Skew      Kurt    SW Rej     Beta       SE        Rej b=1
                                        
            Normal        -0.000    -0.000     2.939    0.049    1.001    0.045434    0.053
            Non-Normal    -0.000    2.403     10.479    1.000    1.000    0.045429    0.053

            Comment


            • #7
              I suspect the person requesting "least squares means" was a SAS user.
              --
              Bruce Weaver
              Email: [email protected]
              Version: Stata/MP 19.5 (Windows)

              Comment


              • #8
                https://stats.stackexchange.com/ques...tical-nonsense is also helpful (despite the unpromising title).

                Comment


                • #9
                  There is also a Stata documentation, I wonder if anyone has posted this before: https://www.stata.com/support/faqs/s...squares-means/
                  Best wishes

                  Stata 18.0 MP | ORCID | Google Scholar

                  Comment


                  • #10
                    Hi All,

                    Sorry for getting back this late.

                    Thank you so much for all the resources sent! They have been very helpful and I managed to understand and conduct all the analyses for my study.

                    Thanks,
                    Cait

                    Comment

                    Working...
                    X