Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • log transformation

    As i am not a statistician i have a naive question. i have 50 dependent variables that i want to regress on 1 independent variable and adjust for sex and bmi. some of the DV are normally distributed and others are not. my question is: do i have to log-transfrom all the variables or only the ones that are not normally distributed.
    thanks in advance

    stat 15.1 on mac

  • #2
    You asked the same question at https://www.statalist.org/forums/for...transformation

    See advice at https://www.statalist.org/forums/help#adviceextras #1 which is pertinent including

    1.2 Repeating the same question

    Please don't! Posting exactly the same question again is strongly discouraged and is unlikely to increase your chance of getting a response.
    My guess is that your question wasn't answered because it raises so many questions about your understanding of regression, which seems to be

    1. the response variable should be normally distributed

    2. if isn't, take logarithms.

    #1 is incorrect. It's not part of any assumption of regression that the response variable is normally distributed. An ideal condition that errors are normally distributed.

    That doesn't mean that logarithmic transformation may not be a good idea, but laying out why and when is too complicated to summarize briefly and helpfully.

    (I am not a statistician either.)

    Comment


    • #3
      Thanks Nick for you comments. you got it right. my understanding of the regression is limited but i am improving .
      I now that the response variable should be normally distributed. my question is whether it is better to log transform all the DV i have even if they are already normally distributed. I guess that if i do i will have the same scale for all the variables and the interpretation would be simpler.

      regards

      abdelilah

      stata 15.1 om mac

      Comment


      • #4
        Originally posted by abdelilah arredouani View Post
        I now that the response variable should be normally distributed.
        You might want to take a minute to re-read Nick's helpful reply.

        my question is whether it is better to log transform all the DV i have even if they are already normally distributed.
        Well, for those that are already normally distributed, taking their logarithm will make them non-normally distributed, and if you think that the response variables must all be normally distributed, wouldn't that be defeating your purpose?

        I guess that if i do i will have the same scale for all the variables and the interpretation would be simpler.
        If you want to re-scale your variables for interpretation, then re-scale them with a linear transformation, e.g., divide each by its range.

        With 50 response variables, it comes across as if you're fishing. Unless you have information that they are associated with sex and BMI, why not forget the two covariates for the moment and, assuming that the predictor is categorical, scan through dotplots of the response variables and visualize their association (or lack thereof) with the one predictor of interest?
        Code:
        pause on
        foreach response_variable in <list> {
            dotplot `response_variable', over(predictor) median center
            pause
        }
        If the predictor of interest is continuous, then do the analogous inspection with scatterplots.

        Comment


        • #5
          #3
          the response variable should be normally distributed
          Once again; not so, but it's not the job of Statalist to teach the statistics you should be studying or have studied.

          Joseph gives excellent advice.

          Comment


          • #6
            Thanks Nick. message received

            Comment

            Working...
            X