Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rescaling variables before taking logs

    Hi,

    Just a very quick question that hopefully someone will be able to answer. I am estimating a log-linear equation: lnY=b0+b1*lnX1+b2*lnX2+ ... + e

    One of my X variables (a very important one) takes negative values in many observations. Its range is around -20 to +20. Obviously, I can't take logs of the zero or negative values; one solution I've used before when dealing with variables with occasional zero values is the inverse hyperbolic sine transformation: ln(x+sqrt(x^2+1)) which is approximately equal to ln(2*x), however, given that I'm working with a lot of negative values, I don't think this is suitable.

    What I've done instead is to rescale the variable before taking its log: ln(20+x). Are there any problems with doing this? In terms of the interpretation, this variable is being used as a proxy.

    Thanks,

    Alex Stead

  • #2
    first, a minor problem: since -20+20=0 and the log of 0 is undefined, you can't (quite) use what you suggest in your last paragraph

    more important: why do you want logs at all?

    Comment


    • #3
      You don't make it clear that you need to transform it at all.

      Comment


      • #4
        Hi Rich,

        I should probably have been a little clearer on that first point. The minimum in the sample is -19.7, so +20 still gives me a small positive number.

        The reason I need to take logs is because in the underlying theory the relationship is multiplicative (i.e. Cobb-Douglas), so I've log-linearised it. Thanks,

        Alex

        Comment


        • #5
          The problem is that the choice of 20 is seemingly arbitrary. And the choice matters. You have X ranging between -20 and 20. Consider y = log(A+X) for different values of A. The minimum possible choice is 20 and it gives a relationship that looks like, well, a logarithmic curve. But you could also use a larger value of A. Even at A = 25, the relationship is a lot flatter. And if you go to A = 30 or 35, much of the curve is lost. By A = 40 you're almost looking at a straight line. So unless there is some substantive science to guide your choice of A, you are doing something more than just a computational convenience: you are choosing the form of the relationship itself (from a one-parameter family of possibilities). And this in turn can have consequences for your estimates of the coefficients of the other variables as well.

          So, let me throw a question back at you. Why are you using a log-linear equation when one of the variables takes on negative values (indeed, is more or less centered around zero)? That seems like a poor specification to start with. Is there some science behind these variables the suggests that kind of relationship between Y and that X? If there is, what does that science have to say about negative values of X? If there is no science to go on, I would suggest graphically exploring the relationship between Y and that X at various combinations of values for the other X's to see if you can get a sense of what a good specification would be.

          Comment


          • #6
            It's kind of tough for the underlying theory that it won't fit a predictor that can be positive, zero, or negative. More positively (pun intended) there seems no harm in generalising the theory mildly by adding an extra term treated as is. Your model could be based on

            ln Y = b_1 ln X_1 + (similar terms) + b_k ln X_k + b_special X_special

            and the treatment of X_special should surely be based on its contribution to the relationship. Cobb and Douglas have been dead long since, so who's complaining? This is consistent with the idea that at the end of the process you get to take exp(predicted ln Y). (If this were my problem, I would be using glm any way.)

            If you really need to transform X_special, cube roots are wonderful. See e.g.

            SJ-11-1 st0223 . . . . . . . . . . . . . . . . . . . Stata tip 96: Cube roots
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
            Q1/11 SJ 11(1):149--154 (no commands)
            tip showing the use of the cube function and cube roots

            Comment


            • #7
              A Cobbs-Douglas production function assumes that inputs are non-negative, hence the functional form makes sense. If you have a negative "input" X1, then it's not an input at all - probably you need to revise the definition or units of X1, so that it can be expressed as a positive input. Which is not the same as randomly picking a number.

              Comment


              • #8
                As I recall, inputs mean things like labour and capital and the appeal to theory here is really three-fold: it seems to fit data reasonably well (that's not a theoretical argument, but it's folded back as support for the theory); the algebra is moderately elegant but easy and fits with notions of elasticity; curved relationships match ideas of diminishing returns. Real economists can take potshots at that, which is dredged up from memories of high-school reading 1967-1969.

                But, but, but: is anyone objecting to the idea that extra predictors might help, regardless of whether they are economic inputs in any strict sense?

                Comment


                • #9
                  Nick is right. Cobb Douglas models have for sometime now been extended to incorporate variables such as R&D, spillovers, etc.

                  Comment

                  Working...
                  X