Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • neglog transformation and interpretation

    Hello,

    I am using Stata 14.
    I have an unbalanced panel data set. I am using -mixed- to estimate fixed and random effects.
    I have to estimate elasticities and I am using a log-log approach for my model:

    log(DV)= (B0+B0i) + (B1+B1i) *ln(V1)it + (B2+B2i) *ln(V2)it + (B3+B3i) *ln(V3)it + (B4+B4i) *ln(V4)it + Ui + Eit

    The independent variables mostly take on positive values or maybe zeros. The DV is a financial variable (return on sales) that can take on positive, negative or zero value (countable few zero values). I read about neglog transformation through this forum and tried to use it for my DV i.e. return on sales. Given the histogram of my DV, I gather neglog to be a good option.

    My question is regarding interpretation of the results when DV is a neglog variable. In a general log-log model case, I would interpret, say for independent variable V1, a 1% increase in V1 is associated with a B1 percent change in DV. How do I interpret when I do a neglog transformation?

    Source used for neglog:
    http://fmwww.bc.edu/repec/bocode/t/transint.html

    Thank you!
    Last edited by cherry singhal; 27 Dec 2016, 16:42.

  • #2
    The transformation is like logarithm of x when x >> 0 and like -logarithm of -x when x << 0. For x ~ 0 it is more like x. So, there won't and can't be a simple wording that fits all cases. Plot the function and its derivative for more insight into how it works.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      So, there won't and can't be a simple wording that fits all cases.
      Thank you Nick for responding. I do not think I understood you well.

      1) what did you imply by "fit all cases"?

      As suggested, I did plot the function and slope to understand the behavior. So, I understand the first statement in your reply.
      You also elaborate in your written piece- "This function passes through the origin, behaves like x for small x, positive and negative, and like sign(x) ln(abs(x)) for large |x|. The gradient is steepest at 1 at x = 0". My reason of using neglog instead of log transformation is to preserve the negative values of the DV as I need to be able to determine output elasticity.

      2) I still fail to be able to extend the interpretation to a log-log model for estimating elasticity.
      For a simple equation like ln(Y)=b0 + b1 ln(X), if I use neglog, I am basically writing neglog(Y)=b0 + b1 ln(X) , where Y = cond(Y< 0, -ln(1 + abs(Y)), ln(1 + abs(Y))) [in neglog transformation, two changes are happening, preserving the sign and also adding 1 before taking log to handle zeros]

      and so if I "attempt" to interpret the result-
      And since both sides are log-transformed, I can get a %change. So, a 1% increase in X, would lead to a b1% change in (abs(Y)+1). But the sign of b1 would change depending on whether Y<0 or Y>0 because when doing neglog, I multiplied by sign(Y). Also, does the intercept change at all since 1 (a constant value) was added to Y?

      Please advice! Thanks in advance!


      3) Btw, my DV has a range of -29.5 to 34.7, mean of -0.003 and stdev=0.383. I also wondered if the range of DV influenced the interpretation in anyway.


      There are hardly any papers that have used neglog, sadly for me (whittaker et al 2005 does quantile regression, and a countable few others). It is not an uncommon problem in my field as to how to log-transform financial variables (+ve,-ve, and 0 values) but preserve the signs and the information from that.

      PS: If this question is better suited for stackexchange website, I will take it there. Since I found out about neglog on this forum, and you have written a piece on transformations, I chose Statalist to ask my question.

      Comment


      • #4
        With all due respect to Nick's -neglog-, I think that it is simply inappropriate to attempt to calculate elasticities for variables that straddle zero. What, indeed, does it mean to say that a 1% increase in X is associated with a b% increase in Y if the base value of Y can be zero? When the base value of Y is zero, any b% change still leaves it at zero. And any change to a non-zero value would be an infinite percent change. It just doesn't make any sense. And if the base value of Y is very close to zero, any measurable deviation from it will be a very large percentage change. That is probably absurd in most contexts.

        This is not a nail; don't use a hammer on it.

        Comment


        • #5
          I also thought I should attach the graph for ln(y) and neglog(y).
          graph twoway function neglogy = cond(ros<0, -ln(1 + abs(ros)), ln(1 + abs(ros))) || function lny = ln(ros) [where ros is my DV or Y.]



          Click image for larger version

Name:	twoway graph.JPG
Views:	1
Size:	92.6 KB
ID:	1369176 Click image for larger version

Name:	density of Y.JPG
Views:	1
Size:	45.7 KB
ID:	1369177


          By comparing the two graphs, I gather that
          1) neglog(y) is more pulled in & closer to zero compared to ln(y).
          2) they are moving the same way up/down, or in other words, neglog is behaving very much like log (for the positive values of Y). Even so, the interpretation of neglog(y) is not same as ln(y), atleast for the positive values of Y ?

          Thanks.

          Comment


          • #6
            Thank you Clyde. I will think more on your comment.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              And if the base value of Y is very close to zero, any measurable deviation from it will be a very large percentage change.
              Clyde, could you please elaborate on this a little.

              Comment


              • #8
                Clyde is right in respect of changes away from zero. But the question appears to be unnecessary. If the logarithm is defined everywhere for your data, you have zero need to resort to neglog at all.

                Comment


                • #9
                  Nick, no logarithm is not defined (at least for statistical analysis) for negative values of the variable, which is why I resorted to neglog.

                  I didn't fully get Clyde's comment.

                  And if I ignore my model where I calculate elasticities for a minute, I still need clarification on interpreting coefficient estimates when DV is neglog-transformed. I don't know if I am just going down a rabbit hole and making things more confusing, but I'd rather be a fool on this forum than one forever.
                  Last edited by cherry singhal; 31 Dec 2016, 18:55.

                  Comment


                  • #10
                    So suppose that for a given value of X, Y is 0.01, and with a 1% increase in X, Y increases to 0.1. That change in Y, expressed as a percent, is `100*(0.1-0.01)/0.01 = 900%. So when we start of near zero, minuscule changes in absolute terms correspond to very large percentage changes.

                    Beyond that, let's remember that expressing a relationship by a single elasticity number, b, by definition means that whatever X is, a 1% increase in X is associated with a b% increase in Y--regardless of what the base values of X and Y are. Now, it is not hard to show mathematically that the only relationship that can have this property is a power law:

                    Y = K*Xb'

                    (Actually, the b' in this equation is not exactly the same as the elasticity number b referred to just before, but they are nearly equal provided b and b' are not very large, and, in fact, when we calculate elasticities by regressing ln Y on ln X, the coefficient we calculate in that regression is actuallly an estimator of b', not b.) K is some constant factor (the value of Y when X = 1) and appears as the constant term in a log-log- regression.

                    The problem is that, the expression Xb' is not definable for negative values of X except when b' is an integer. So the whole concept of elasticity just breaks down when negative numbers are involved.

                    You have correctly observed that the neglog function doesn't truly solve this problem for you. While neglog is very close to ln when X >> 1, it behaves more like X than ln X when X is near zero, so interpeting a coefficient in a neglog-neglog regression as an elasticity would be quite incorrect.

                    At the end of the day, there is no getting around the fact that elasticities cannot be defined for variables that straddle zero.

                    Comment


                    • #11
                      I have not been giving this thread the attention it deserves given travelling and other seasonal distractions and afflictions.

                      I take Clyde's points, but underline that elasticity isn't defined at zero for power functions either. I don't see that as entirely fatal. If it were power functions could hardly be the reference case.

                      And the skew symmetry whereby neglog(-x) = -neglog(x) would seem to allow some algebraic generalisations for negative values as well as positive.

                      But whether a neglog transformation of the response (Cherry's term DV is not one I use) can usefully be linked with an elasticity interpretation for regression coefficients is very much in doubt.

                      I didn't read Cherry's examples carefully enough in previous visits. I glanced too quickly at the first graph in #5 and thought: So logarithms can be taken of all values, so why the fuss? But Cherry did flag negative values clearly in #1.

                      But there seems to be some confusion about appropriate use of twoway function. twoway function really expects functions in terms of a generic x; it is not illegal to use it otherwise, but the results are often bizarre or worse. In this case I think you are getting traces of the first so many values of each variable -- but Stata is just jumping over the gaps where log(x) is missing -- and the x axis is labelled 0 to 1. Confusing, I think.

                      For a range of say (-35, 35) neglog looks like this:

                      Code:
                      twoway function  neglog= cond(x < 0, -ln(1 + abs(x)), ln(1 + x)), ra(-35 35) xla(-35(5)35)


                      Click image for larger version

Name:	neglog.png
Views:	1
Size:	9.5 KB
ID:	1369355


                      but how much effect the transformation depends on the distribution of data within the range. It could just be pulling in a few outliers.

                      The entire range is also crucial in practice. Experiment with graphs with larger and smaller ranges to see that.

                      Comparison with logarithmic transformation seems to me of limited relevance when that can't be applied to so much of your response. But for positive values it is manifest that log(x) can be arbitrarily small as x decreases while log(1 + x) can only approach zero.

                      I'd fit models to your response, untransformed and neglog transformed, and see how different the results are. You can back transform predictions from the second case to compare predictions.

                      The coefficients are what they are coefficients, gradients coupling response (transformed or not) and predictor (transformed or not).

                      We haven't talked about the zeros in your predictors and what you did with them. I don't understand what the second graph in #5 is showing us.
                      Final disclaimer for now is that I don't work in economics and finance to comment on what is standard in these fields.

                      Comment


                      • #12
                        I am very interested regarding this post I am also using the neglog transformation. But my independent variable is a positive one (real GDP per capita). I have several independent variables, some of them have negative values, like public deficit or foreing direct investment. If the log-modulus transformation (called neglog transformation) is used instead of the usual natural logarithm, the estimators can no longer be interpreted as elasticities?
                        Thank you!

                        Comment

                        Working...
                        X