Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log or not a skewed covariate

    Dear All,

    I have a GLM, and one of my independent variables is skewed (income). Income has generally a log-normal distribution.

    I was thinking not to log, as I want to calculate some marginal effects after the GLM. In case I log the interpretation is a bit different. Given that the majority of the literature uses log when using income as a continuous variable, is it a mistake to avoid using logarithm?

    To my knowledge, there is no assumption on the distribution of the independent variables, so the only thing that changes is the magnitude of the coefficient. Do you agree?

    Thank you.


  • #2
    I agree, generally speaking. Ubder GLM, you may also try with different models (I mean, a log link, for example). What is more, you may compare models with and without the logarithmically transformed variable.
    Best regards,

    Marcos

    Comment


    • #3
      Thank you Marcos. I also use a log-link with Gamma family in the GLM. Should I log in this case (of log link), or it's better to leave the skewed variable as it is?

      Comment


      • #4
        Nikos:
        as a general aside, in log-linear regression model with income as -depvar-, income is logged to give a percentage-based interpretation of its change due to 1-unit change of each predictor (when adjusted for the remaining ones).
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment


        • #5
          Thank you for the comment Carlo. Income is an independent variable in my case. What's your opinion regarding log-transformation of income (or any skewed independent variable)? Could I avoid it log-transformation, or not transforming will cause problems?

          Comment


          • #6
            Nikos:
            - it depends on the skewness direction; if I'm not mistaken, log transformation worsens the distribution of a negatively skewed variable (but this should not be the case with -income-);
            - it depends on the interpretation that your research field considers in line with the methodological mainstream: as far as I know, linear-log model are not that widespread in the (health) economic literature, whereas log-linear models are quite frequent (for the reasons provided in my previous reply);
            - last but not least, it depends on the effort/feasibility of transforming the results back to the original metric (it may happen when you present the findings of your research to a non-technical audience).
            Kind regards,
            Carlo
            (Stata 18.0 SE)

            Comment


            • #7
              I'd say the most convincing reason for using log X not covariate X would be that, other stuff aside, the outcome has more nearly a linear relationship with log X. Skewness as such is less important, except if outliers are present.

              Comment


              • #8
                Thank you Nick. I guess, this depends on the setting (the linear relationship).

                I would like to estimate some marginal effects afterwards, and I think it's better to use income as X -not logX.

                In the context of GLM with log link, do you think that using X (i.e. income) instead of logX is a major problem that might create additional problems?

                Comment


                • #9
                  Sorry; I am not an economist, so substantive comment I leave to others.

                  Comment


                  • #10
                    Nikos: In #8 you write "do you think that using X (i.e. income) instead of logX is a major problem that might create additional problems?"

                    This is hard to adjudicate without more context, but I might suggest that your baseline thinking could matter here.

                    That is, are you trying to argue yourself out of using a log-transformation (log-x is your default) or argue yourself into using a log-transformation (x is your default). If interpreting your results using x is more attractive to you (as hinted in #8), then perhaps consider adopting the latter mindset and see if you can rationalize using log-x based on some principles/criteria. If not, then my (admittedly personal) recommendation would be: Use x.
                    Last edited by John Mullahy; 04 May 2017, 12:27. Reason: typo

                    Comment


                    • #11
                      Thank you very much! The work I am currently doing is largely influenced by your paper, published in JHE in 1998. Definitely, X is more attractive as an option, but most studies use a logX. So, I was quite confused regarding the validity of using X as an independent variable.

                      That is, are you trying to argue yourself out of using a log-transformation (log-x is your default) or argue yourself into using a log-transformation (x is your default). If interpreting your results using x is more attractive to you (as hinted in #8), then perhaps consider adopting the latter mindset and see if you can rationalize using log-x based on some principles/criteria. If not, then my (admittedly personal) recommendation would be: Use x.
                      If I understood correctly, do you mean that I can use X, and then try to validate my findings with a model that uses logX (kind of robustness checks)?





                      Comment


                      • #12
                        Nikos: Your proposal makes sense to me. I say: Go for it. Best, John

                        Comment


                        • #13
                          Thank you! Really useful advice

                          Comment

                          Working...
                          X