I've seen several threads here about modeling a continuous outcome that is positively skewed, and often has many zero values (for example, this thread and this blog post), but I haven't seen much about appropriate multivariate regression models for a highly negatively skewed dependent variable. I'd like to hear thoughts from others, potentially in various disciplines, on this (I'm a sociologist).
To contextualize, my dependent variable is an interval measure (scale of attitudes, ranging from 1-4) that was created by taking the mean of 10 items with a 4-point response scale (1=Strongly disagree, 4=Strongly Agree). 40% of respondents (sample size of approx. 2,000) have a 4 on this scale, an additional 10% fall into the second highest category, and the remainder fall between 1 and 3.8 (variable skewness = -1.96, kurtosis=7.65). Based on the theoretical background of this measure, that 40% fall into the highest category is actually interesting and may be meaningful.
Ideally, we would want to assess whether hypothesized independent variables (a mix of categorical and continuous) are associated with stronger attitudes, and the traditional way of assessing this would be OLS, but of course there are non-linear relationships between the predictors and outcome and the residuals are non-normally distributed.
So, what would be the next step in finding an appropriate model? My first thought is GLM, but I don't know enough about distributions that are appropriate for such (zero truncated) negatively skewed data.
Thanks for sharing your thoughts.
To contextualize, my dependent variable is an interval measure (scale of attitudes, ranging from 1-4) that was created by taking the mean of 10 items with a 4-point response scale (1=Strongly disagree, 4=Strongly Agree). 40% of respondents (sample size of approx. 2,000) have a 4 on this scale, an additional 10% fall into the second highest category, and the remainder fall between 1 and 3.8 (variable skewness = -1.96, kurtosis=7.65). Based on the theoretical background of this measure, that 40% fall into the highest category is actually interesting and may be meaningful.
Ideally, we would want to assess whether hypothesized independent variables (a mix of categorical and continuous) are associated with stronger attitudes, and the traditional way of assessing this would be OLS, but of course there are non-linear relationships between the predictors and outcome and the residuals are non-normally distributed.
So, what would be the next step in finding an appropriate model? My first thought is GLM, but I don't know enough about distributions that are appropriate for such (zero truncated) negatively skewed data.
Thanks for sharing your thoughts.
Comment