Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dependent variable in absolute values

    Dear all,
    I have a panel dataset of firms in which my dependent variable is a ratio that can take positive and nagative values. I am interested, however, not in the sign of the ratio but on its magnitude and I decided to use the absolute value of the ratio as dependent variable.
    This leads the dependent variable to be, of course, positively skewed. I have checked for the normality of residulas after I run the xtreg regression (in which I cluster st.errors by firms), and the residuals are fairly normally distributed graphically, and also the skeweness and kurtoris seem to be not that bad (Skeweness 0.784; Kurtosis 4.21).
    I know that, for OLS assumptions to be not violated, what matters is that the residuals are normally distributed and not the dependent variable, however some of my instructors (I am a phd student) raised some concern on the DV in absolute terms and suggested to use a log trasformation of the DV.
    When I take the log transformation, however, some absolute values turn to be negative when the ratio is <1. Moreover, when i run again the regression with the log DV, the residuals become skeewed, so I guess I should not transform the DV in log and stick to the absolute values.
    Could you tell me if my intuition is right or if, instead, having a absolute DV brings other problems that I am not considering?
    Thanks a lot in advance.

  • #2
    Residuals being normally distributed is arguably the least important assumption of plain regression!

    Logarithmic transformation is a possibility so long as all your ratios are positive. It's in no sense a problem if any logarithms are negative although if you have very small positive ratios then there is a possibility that your transformation might create outliers.

    For any response that can't be negative I would recommend consideration also of Poisson regression as discussed carefully at https://blog.stata.com/2011/08/22/us...tell-a-friend/ (It's a myth that you need a counted response.)
    Some zeros in the response are tolerable with this method.

    In your case that sounds like xtgee

    Comment


    • #3
      Thanks for your reply Nick. However, I still have some doubts.
      I know that poisson is used when you have a count variable. However, my dependent variable is not a count but it is a ratio that can be negative or positive. The reason is that this ratio is a measure of deviation from normal "accruals" of the firm, estimated through another model that it not important to discuss. Hence, this deviation can be positive or negative. However, what I am interested in is how much a firm deviates not in which direction it deviates. That's why I consider the absolute values of the ratios. Now the ratio is between 0 and 1 (min 0.001 and max 0.385), so basically all of them are turned to negative when I transform the absolute values into logs.
      The problem is not that much that they become negative, I can change the way I interpret the results accordingly. My question was more general.
      It is a problem to use the absolute values for the dependent variable in the first place? Because if it is not, I would not transform the DV and leave it is absolute values.
      Thanks in advance again.

      Comment


      • #4
        I already commented -- and the blog post referenced explains at greater length -- that not having counts is not a problem. Nor is logarithms being negative a problem as Poisson regressions -- and more widely, generalized linear models with any particular link function -- return predictions on the original scale of the response. There is nothing worrying or pathological about negative logarithms!

        But if the ratio is naturally within (0, 1) then logit link is arguably called for. See e.g. https://www.stata-journal.com/sjpdf....iclenum=st0147 Meanwhile, note that the common presumption that economists thought of this first is not quite true as a search for Wedderburn 1974 will confirm. https://academic.oup.com/biomet/article/61/3/439/249095 is a precise reference.

        However, you can see for yourself with

        Code:
        twoway function logit = logit(x),  ra(0.001 0.385) || function log=log(x),  ra(0.001 0.385)
        -- which you can copy directly into Stata -- that logit and logarithm are almost identical over your range considering that logit p = log p - log(1 - p) is close to log p over that range, because log (1 - p) is close to 0 if p is close to 0.

        So while logit is a natural link for a response bounded between 0 and 1 Poisson regression results would be very close.

        Comment


        • #5
          Now it is more clear, thanks a lot Nick! I will try with both the logistic and poisson regression.

          Comment

          Working...
          X