Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transforming Likert Scaled Variables (Right Skewed) to Normal Distributions

    Dear All,

    I am writing to ask about possible methods in which Likert scaled variables (5 point and right skewed - lots of 5s) can be transformed so that the distribution becomes normal and they can be used with parametric tests - in this case instrumental variable regression and selection models.

    I have tried squaring the variables and log transformation thus far, but am at a loss about how to proceed.

    Any advice would be greatly appreciated.

    Thanks,

    Gary


  • #2
    Squaring 1 to 5 could only make right skewness worse. I don't think logarithms of Likert scales make any substantive sense, not that squares do either.

    It you wish to use such variables in regression there is absolutely no requirement that any variables in regression have normal marginal distributions. Treating a Likert variable as response could be problematic, but not because of lack of normality.

    I can't speak for the specifics of selection models.

    Comment


    • #3
      Gary Chapman don't do it is the best advice I could offer. Many of the transformations that you are thinking about have an underlying assumption about the measurement scale of the variable, which typically amounts to the variable needing to be measured on an intervallic or ratio scale. Polytomous response sets like the one you describe are ordinal in nature and using a transformation like this imposes different measurement properties on your data (e.g., it fixes the distance between integers to be equal). If you have items that are highly skewed like this, chances are the item isn't very informative for the construct you are trying to measure for anything but a limited distribution of theta. Maybe if you provided a bit more context about your project others could help share other approaches/thoughts on the topic, but with the information available I think it will be more difficult to get you some advice on different approaches you can try to move things forward.

      Comment


      • #4
        Thanks Nick and Wbuchanan for your advice - I'm new to this so learning as I go.

        Yes I can provide more information. They are measuring innovation related attitudes with 1 = strongly disagree and 5 = strongly agree. Each construct has 4 measures, which are added together to get the construct score (0-20 score). However, they are heavily skewed in most cases as the firms were studying (i.e. government supported firms) are quite innovative. I have run matching estimators on the data using the teffects psmatch command however, I know this only controls for observables so I also want to run an IV regression or selection model as robustness given they help control for unobservable differences. However, from my readings these are both parametric tests and sensitive to violations of the normality assumption. Hence I was wondering if anyone had come across ways to transform them to normal, or if I proceeded with the tests as is - are the results obtained reliable?

        Any help is much appreciated.

        Thanks.

        Comment


        • #5
          Which normality assumption are you referring to? Transformations of such variables make essentially no sense, would be unlikely to work any way without bizarre side-effects and are not needed for any assumption of a regression model. What is the outcome or response variable you are trying to predict?

          Comment


          • #6
            I had read that the dependent variable should be normally distributed - is this not correct? And the Likert scale variables are the outcome variables - we are using a mixture of dummy, ordinal and continuous variables to predict innovation attitudes....

            Comment


            • #7
              At most error terms should be normally distributed in classical linear regression; that is not an assumption about any marginal distribution and it's the least important assumption even when it's made. I don't know your field, but modern texts such as Jeff Wooldridge's introductory econometrics text make this clear. (We're pleased to have him as a member here.)

              But if you trying to predict something 1 to 5, linear regression is the wrong place to look any way: some kind of model for an ordinal response is a better starting point.

              Comment


              • #8
                There are a couple of quick thoughts I would look into after looking at the responses briefly. If you know how the items load on different dimensions you can fit a CFA to scale the individual dimensions while allowing the latent measures to be correlated. You could do the same for the individual constructs with CFA or IRT, and would end up treating the measures as unrelated. If you have Stata 14, you could also use the Bayesian statistics tools to fit a multidimensional IRT model. In each of these cases there are different sets of assumptions to consider, but given the larger problem (e.g., lack of variance) it might be worth considering the multidimensional IRT/CFA models since the estimates of theta will be more sensitive to differences in the item difficulties/discrimination than a sum score.

                Comment

                Working...
                X