Rescaling variables before taking logs

Alex Stead

Join Date: Aug 2014

Posts: 42
#1

Rescaling variables before taking logs

26 Sep 2014, 11:48

Hi,

Just a very quick question that hopefully someone will be able to answer. I am estimating a log-linear equation: lnY=b0+b1*lnX1+b2*lnX2+ ... + e

One of my X variables (a very important one) takes negative values in many observations. Its range is around -20 to +20. Obviously, I can't take logs of the zero or negative values; one solution I've used before when dealing with variables with occasional zero values is the inverse hyperbolic sine transformation: ln(x+sqrt(x^2+1)) which is approximately equal to ln(2*x), however, given that I'm working with a lot of negative values, I don't think this is suitable.

What I've done instead is to rescale the variable before taking its log: ln(20+x). Are there any problems with doing this? In terms of the interpretation, this variable is being used as a proxy.

Thanks,

Alex Stead
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4458
#2

26 Sep 2014, 11:58

first, a minor problem: since -20+20=0 and the log of 0 is undefined, you can't (quite) use what you suggest in your last paragraph

more important: why do you want logs at all?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35642
#3

26 Sep 2014, 11:59

You don't make it clear that you need to transform it at all.
Comment
Alex Stead

Join Date: Aug 2014

Posts: 42
#4

26 Sep 2014, 12:02

Hi Rich,

I should probably have been a little clearer on that first point. The minimum in the sample is -19.7, so +20 still gives me a small positive number.

The reason I need to take logs is because in the underlying theory the relationship is multiplicative (i.e. Cobb-Douglas), so I've log-linearised it. Thanks,

Alex
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#5

26 Sep 2014, 12:08

The problem is that the choice of 20 is seemingly arbitrary. And the choice matters. You have X ranging between -20 and 20. Consider y = log(A+X) for different values of A. The minimum possible choice is 20 and it gives a relationship that looks like, well, a logarithmic curve. But you could also use a larger value of A. Even at A = 25, the relationship is a lot flatter. And if you go to A = 30 or 35, much of the curve is lost. By A = 40 you're almost looking at a straight line. So unless there is some substantive science to guide your choice of A, you are doing something more than just a computational convenience: you are choosing the form of the relationship itself (from a one-parameter family of possibilities). And this in turn can have consequences for your estimates of the coefficients of the other variables as well.

So, let me throw a question back at you. Why are you using a log-linear equation when one of the variables takes on negative values (indeed, is more or less centered around zero)? That seems like a poor specification to start with. Is there some science behind these variables the suggests that kind of relationship between Y and that X? If there is, what does that science have to say about negative values of X? If there is no science to go on, I would suggest graphically exploring the relationship between Y and that X at various combinations of values for the other X's to see if you can get a sense of what a good specification would be.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35642
#6

26 Sep 2014, 12:13

It's kind of tough for the underlying theory that it won't fit a predictor that can be positive, zero, or negative. More positively (pun intended) there seems no harm in generalising the theory mildly by adding an extra term treated as is. Your model could be based on

ln Y = b_1 ln X_1 + (similar terms) + b_k ln X_k + b_special X_special

and the treatment of X_special should surely be based on its contribution to the relationship. Cobb and Douglas have been dead long since, so who's complaining? This is consistent with the idea that at the end of the process you get to take exp(predicted ln Y). (If this were my problem, I would be using glm any way.)

If you really need to transform X_special, cube roots are wonderful. See e.g.

SJ-11-1 st0223 . . . . . . . . . . . . . . . . . . . Stata tip 96: Cube roots
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q1/11 SJ 11(1):149--154 (no commands)
tip showing the use of the cube function and cube roots
Comment
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#7

26 Sep 2014, 12:17

A Cobbs-Douglas production function assumes that inputs are non-negative, hence the functional form makes sense. If you have a negative "input" X1, then it's not an input at all - probably you need to revise the definition or units of X1, so that it can be expressed as a positive input. Which is not the same as randomly picking a number.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35642
#8

26 Sep 2014, 12:25

As I recall, inputs mean things like labour and capital and the appeal to theory here is really three-fold: it seems to fit data reasonably well (that's not a theoretical argument, but it's folded back as support for the theory); the algebra is moderately elegant but easy and fits with notions of elasticity; curved relationships match ideas of diminishing returns. Real economists can take potshots at that, which is dredged up from memories of high-school reading 1967-1969.

But, but, but: is anyone objecting to the idea that extra predictors might help, regardless of whether they are economic inputs in any strict sense?
Comment
Joseph

Join Date: Jul 2014

Posts: 9
#9

27 Sep 2014, 08:09

Nick is right. Cobb Douglas models have for sometime now been extended to incorporate variables such as R&D, spillovers, etc.
Comment

Announcement

Rescaling variables before taking logs

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment