Log transformation of bound regressor

Jonas Wereld

Join Date: Feb 2017

Posts: 23
#1

Log transformation of bound regressor

14 Apr 2017, 07:30

One of my independent variables has a maximum possible value of 1 (the percentage of respondents in a particular country having said "agree" with a particular survey question) and the distribution is somewhat skewed. I thought about log-transforming this variable, but would this not be problematic considering the upper bound? And even if not transformed, do I need to take the boundness somehow into account?

Gracias!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 36053
#2

14 Apr 2017, 08:37

A fraction between 0 and 1 is best called a proportion or a fraction, not a percentage.

That said, 1 as upper bound is no problem as log 1 is 0. 0 as lower bound would be a problem. But logarithm is unlikely to be a good idea unless the variable is right-skewed and transformation is consistent with other ideal conditions for whatever model you have in mind.
Comment
Jonas Wereld

Join Date: Feb 2017

Posts: 23
#3

14 Apr 2017, 09:08

Thanks for the clarifications, Nick.

So far, I used -sktest- and -histogram- to assess whether I needed log-transformation. Luckily, the mentioned variable nor my others under consideration have a 0 as a value. My variables are rather left-skewed, why would this be a different situation than right-skewed? And what exactly do you mean with "ideal conditions"? I would like to estimate a logit (first) and xtlogit (second) regression, afterwards a linear fixed effects regression with a different dependent variable which (or so I thought) should also be log-transformed. What would speak against using the logged variables?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#4

14 Apr 2017, 09:25

Left and right skewness are often called negative and positive skewness. Log transformation of a left skewed variable can only make the skewness worse. It seems that you need to study a basic account of transformation: one such is downloadable as a help file from

Code:

ssc inst transint

See any good text on regression for an account of the conditions ideal for regression, often rather misleadingly called assumptions.

In fact skewness of any variable in a regression model is not in itself a problem but is in practice often associated with situations that benefit from transformation for other reasons.
Comment
Jonas Wereld

Join Date: Feb 2017

Posts: 23
#5

14 Apr 2017, 10:37

Thanks again, -transint- proves to be very informative. I now used either log or squared transformations of my variables depending on the skewness, however, comparing via -sktest- and -hist- it seems that my distributions have not at all normalized. Thus I was wondering if there is a test for analyzing the distributions of the untransformed variables or what the generally best way of proceeding would be? So far I am just crudely testing with my eyes on the histogram how the skewness can be described plus relying on -sktest-. How can I tell whether a distribution like this:

or this:

can really be described as right- or left-skewed or in any other form ...?

Also, do you know of a source that succinctly summarizes under what circumstances the non-use of transformation despite skewness and/or kurtosis can be justified?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#6

15 Apr 2017, 01:28

I am travelling at present and will reply in a day or two.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#7

17 Apr 2017, 03:27

Working backwards:

do you know of a source that succinctly summarizes under what circumstances the non-use of transformation despite skewness and/or kurtosis can be justified?

There is some confusion here or rather I am not clear what you are asking. Kurtosis is a measurable property, not a yes or no condition. Perhaps you mean kurtosis indicative of non-normal distributions. But your context is regression and there is no assumption in regression that any variable has a marginal distribution that is normal. So perhaps you are asking for literature that advises that assumptions that don't apply do not in fact apply. I am not clear about that, but it would be true of any good regression text but often by implication, not direct statement. Not many good regression texts waste their time debunking myths that somehow survive in the weaker literature, such as this myth that normal distribution is required for regression.

Your two histograms could be basic examples in any introductory course of a strongly right-skewed distribution and an approximately symmetric distribution respectively. You could measure skewness using the standard moment-based measure or in several other ways, e.g. (mean - median ) / SD. No single measure of skewness can guarantee to capture all the information, however, which is why graphs remain important.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36053
#8

17 Apr 2017, 05:51

Your question seems close to https://stats.stackexchange.com/ques...se-of-skewness

Please note our policy on cross-posting, which is that you should tell us about it. http://www.statalist.org/forums/help#crossposting

In fact, we've already given that advice:

http://www.statalist.org/forums/foru...ith-panel-data

Last edited by Nick Cox; 17 Apr 2017, 06:01.
Comment

Announcement

Log transformation of bound regressor

Comment

Comment

Comment

Comment

Comment

Comment

Comment