Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log transformation of bound regressor

    One of my independent variables has a maximum possible value of 1 (the percentage of respondents in a particular country having said "agree" with a particular survey question) and the distribution is somewhat skewed. I thought about log-transforming this variable, but would this not be problematic considering the upper bound? And even if not transformed, do I need to take the boundness somehow into account?

    Gracias!

  • #2
    A fraction between 0 and 1 is best called a proportion or a fraction, not a percentage.

    That said, 1 as upper bound is no problem as log 1 is 0. 0 as lower bound would be a problem. But logarithm is unlikely to be a good idea unless the variable is right-skewed and transformation is consistent with other ideal conditions for whatever model you have in mind.

    Comment


    • #3
      Thanks for the clarifications, Nick.

      So far, I used -sktest- and -histogram- to assess whether I needed log-transformation. Luckily, the mentioned variable nor my others under consideration have a 0 as a value. My variables are rather left-skewed, why would this be a different situation than right-skewed? And what exactly do you mean with "ideal conditions"? I would like to estimate a logit (first) and xtlogit (second) regression, afterwards a linear fixed effects regression with a different dependent variable which (or so I thought) should also be log-transformed. What would speak against using the logged variables?

      Comment


      • #4
        Left and right skewness are often called negative and positive skewness. Log transformation of a left skewed variable can only make the skewness worse. It seems that you need to study a basic account of transformation: one such is downloadable as a help file from

        Code:
        ssc inst transint
        See any good text on regression for an account of the conditions ideal for regression, often rather misleadingly called assumptions.

        In fact skewness of any variable in a regression model is not in itself a problem but is in practice often associated with situations that benefit from transformation for other reasons.

        Comment


        • #5
          Thanks again, -transint- proves to be very informative. I now used either log or squared transformations of my variables depending on the skewness, however, comparing via -sktest- and -hist- it seems that my distributions have not at all normalized. Thus I was wondering if there is a test for analyzing the distributions of the untransformed variables or what the generally best way of proceeding would be? So far I am just crudely testing with my eyes on the histogram how the skewness can be described plus relying on -sktest-. How can I tell whether a distribution like this:
          Click image for larger version

Name:	1.png
Views:	1
Size:	4.3 KB
ID:	1383736


          or this:
          Click image for larger version

Name:	2.png
Views:	1
Size:	5.9 KB
ID:	1383737


          can really be described as right- or left-skewed or in any other form ...?

          Also, do you know of a source that succinctly summarizes under what circumstances the non-use of transformation despite skewness and/or kurtosis can be justified?

          Comment


          • #6
            I am travelling at present and will reply in a day or two.

            Comment


            • #7
              Working backwards:

              do you know of a source that succinctly summarizes under what circumstances the non-use of transformation despite skewness and/or kurtosis can be justified?
              There is some confusion here or rather I am not clear what you are asking. Kurtosis is a measurable property, not a yes or no condition. Perhaps you mean kurtosis indicative of non-normal distributions. But your context is regression and there is no assumption in regression that any variable has a marginal distribution that is normal. So perhaps you are asking for literature that advises that assumptions that don't apply do not in fact apply. I am not clear about that, but it would be true of any good regression text but often by implication, not direct statement. Not many good regression texts waste their time debunking myths that somehow survive in the weaker literature, such as this myth that normal distribution is required for regression.

              Your two histograms could be basic examples in any introductory course of a strongly right-skewed distribution and an approximately symmetric distribution respectively. You could measure skewness using the standard moment-based measure or in several other ways, e.g. (mean - median ) / SD. No single measure of skewness can guarantee to capture all the information, however, which is why graphs remain important.

              Comment


              • #8
                Your question seems close to https://stats.stackexchange.com/ques...se-of-skewness

                Please note our policy on cross-posting, which is that you should tell us about it. http://www.statalist.org/forums/help#crossposting

                In fact, we've already given that advice:

                http://www.statalist.org/forums/foru...ith-panel-data
                Last edited by Nick Cox; 17 Apr 2017, 06:01.

                Comment

                Working...
                X