Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When to use logarithmic variables

    Hello All,

    I want to make a regression. My dependent variable Y is a score of discrete numbers from 0 to 12. I have 6 independent variables. Four of them are in percent between 0.4 % and 40 %. The two other independent variables are not in percent. One of them measures years and is between 1.8 and 67. The other independent variable measures size and is between 0.008 and 5,117 (very high).

    I don't want to use winsorizing, that's why I decided to use logarithm. My question now: Is it good when I log the two independent variables which are not in percent, because they have high vaules?

    That means I would use: reg Y x1 x2 x3 x4 logx5 logx6

    Why I would like to use log is because I want to look out for the outliers. I have not really outliers, but I think it's better to use log for comparision reasons.

    Or do I just have to use log, when the distribution of some variables is skewed? Meaning log leads to a lower skewness and is thus used appropriately.

    Thank you in advance.
    Lukas

  • #2
    Logging predictors is most usually likely to be helpful when the resulting data cloud is closer to the functional form you're postulating in the first place. That often goes in hand with pulling in outliers but there is no assumption in regression that predictors have even a symmetric distribution.

    "High values" are neither here nor there except possibly that precision can be compromised if different predictors are on very different scales, but there is nothing the least bit worrying in your post on that.

    Comment

    Working...
    X