Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can ratio data be logarithmically transformed?

    Hello Statalists,

    Normally people logarithmically transform some variables that are positively skewed, to make them normal. But if such a skewed variable is in the form of ratio (here I mean only ratio between 0 and 1), technically is it appropriate to do logarithmical transformation? In many empirical studies, ratio data seem not to be logarithmically transformed, but some ones indeed do this.

    Sorry for the perhaps less relevance of my question with STATA software.

    Thank you very much
    Last edited by Alex Mai; 22 Oct 2016, 13:54.

  • #2
    If the variable takes on values between 0 and 1, but not including 0, then, yes, you can logarithmically transform it. But with the original variable in that range, the effect of log transforming it will be to increase its skewness.

    But I think that this question deserves the same response you got earlier from Nick Cox about transforming variables. Why do you want to do this? What purpose does it serve in your analysis. A generic answer about transformations will not be satisfactory here: you need to consider what you are doing specifically in relation to your research goals and how and whether different transformations will help you achieve them.

    Do bear in mind that the notion that variables must have normal distributions in order to be used in regression analysis is a myth, a widespread piece of misinformation. If that is what you are thinking, stop right here and go back to the drawing boards.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      If the variable takes on values between 0 and 1, but not including 0, then, yes, you can logarithmically transform it. But with the original variable in that range, the effect of log transforming it will be to increase its skewness.

      But I think that this question deserves the same response you got earlier from Nick Cox about transforming variables. Why do you want to do this? What purpose does it serve in your analysis. A generic answer about transformations will not be satisfactory here: you need to consider what you are doing specifically in relation to your research goals and how and whether different transformations will help you achieve them.

      Do bear in mind that the notion that variables must have normal distributions in order to be used in regression analysis is a myth, a widespread piece of misinformation. If that is what you are thinking, stop right here and go back to the drawing boards.
      Dear Clyde,

      thank you very much! My original consideration is indeed the normal distribution of RHS variable. Now I know this is not true! I also learn from another statalist post about similar information. But elsewhere people always emphasise the so-called normality of RHS variables.

      Comment


      • #4
        Alex:
        as you agreed upon, it's a sort of (everlasting) popular misconception.
        When it comes to OLS, the normal distribution is a prerequisite of residuals only.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Alex:
          as you agreed upon, it's a sort of (everlasting) popular misconception.
          When it comes to OLS, the normal distribution is a prerequisite of residuals only.
          Even for the residuals you often do not need the normality assumption. Unbiasedness and consistency of the OLS estimator do not rely on it. It only matters for hypothesis testing if the sample size is small to obtain the t- or F-distribution of the test statistics. For large sample sizes, the asymptotic distributions of the test statistics (where normality follows from the the central limit theorem and does not need to be assumed in the first place) can be safely used as approximations.
          https://www.kripfganz.de/stata/

          Comment


          • #6
            Sebastian is correct.
            However, as it often happens in the quantitative world, when large is large enough to allow relying on the central limit theorem is difficult to say.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              There is another issue that is not being considered here, which is the nonlinearity in the relationship between the explained variable, the way it's defined, and the explanatory variables. A fractional response explained variable can be thought of a proportion or a probability. There is a question of how increasing the value of an explanatory variable should affect this probability, in a linear fashion or not? Notice that the linear relationship can produce predicted values beyond the two logical boundaries: 0 and 1. Also we normally like the relationship to follow a nonlinear relationship. Many practitioners may attempt to do this by log-transforming the explained variable. This can help on the upper boundary (the 1), clearly not at the bottom boundary (the 0) where the log is indeterminate. I personally believe that a quadratic, or even cubic, relationship with the explanatory variables is a better way to capture the nonlinearities than transforming the explained variable into its natural logarithm, if you insist on using OLS. I also personally believe that using a probit or logit estimator is much better in these cases, but some people may not agree on such parametrization.

              A further comment to Clyde Schechter's, as usual, great comments, is that the linear probability model (i.e. using OLS to predict either a binary or a fractional response variable) suffers from heteroskedasticity, and this is not resolved by log-transforming the explained variable.
              Last edited by Alfonso Sánchez-Peñalver; 23 Oct 2016, 11:36.
              Alfonso Sanchez-Penalver

              Comment


              • #8
                In my opinion, there's also another nuisance that transforming variables brings about: the best way to back-transform to the original metric, when the outcome of the statistical analysis should be presented to an audience with a limited smattering of statistics.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  I agree Carlo. Te commodity of explaining everything in terms of OLS because everybody is very comfortable using OLS and its properties, it makes us to try and do everything we can to use OLS, and not use nonlinear estimators.
                  Alfonso Sanchez-Penalver

                  Comment


                  • #10
                    Some discussion that may add extra points at http://stats.stackexchange.com/quest...y-are-an-indep

                    Comment

                    Working...
                    X