Transforming negative values in order to solve heteroskedasity

Daniel Son

Join Date: Jun 2015

Posts: 16
#1

Transforming negative values in order to solve heteroskedasity

30 Nov 2015, 12:25

Hi all,

Just a question regarding heteroskedasity. I am using the level of interest rates, which appear to be negative for some countries during my sample period. The problem is that my dependent variable and this specfic variable have high levels of heteroskedasity. I was wondering how I can solve this issue by transforming the variables. Single transformation of the dependent variable does not help, therefore I also need to transform my indepdent variable.

Further, I am using a fixed effects model and am aware of the fact that vce(robust) will also deal with heteroskedasity. However, the standard robust errors and the normal errors show large differences where for I want to control the heteroskedasity in order to prevent model misspecification.

I can not use another variable which only has positive values. I was wondering if it is possible to rescale my variable by adding a constant to all variables of at least the minimum value+0.00001 (x+costant).

Thanks a lot,

Daniel
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35721
#2

30 Nov 2015, 12:39

Heteroscedasticity (heteroskedasticity if you like) is a matter of conditional distribution of the response: it is unaffected by the labelling or values of the predictors.

I don't understand the rationale for your proposed transformation minimum value + 0.00001 (x + constant).

It looks textbook linear to me; so will change nothing fundamental.

Cube roots have been suggested as a transformation that can work well on variables that take on both negative and positive values. In a Stata context see http://www.stata-journal.com/sjpdf.h...iclenum=st0223

For another discussion see http://stats.stackexchange.com/quest...ptokurtic-data
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

30 Nov 2015, 12:43

If the matter is (as it seems to be) mainly transformation, Daniel may perhaps want to take a look on this thread: http://www.statalist.org/forums/foru...interpretation. By the way , if the dependent variable is a percentage, for instance, there is also some information on the same thread.

Best regards,

Marcos
Comment
Daniel Son

Join Date: Jun 2015

Posts: 16
#4

30 Nov 2015, 12:44

Thank you for your answer.

By changing the values (x+constant) I rescale the values and no negative values are left. Therefore, I can transform the data using its natural logarithm in order to deal with heteroskedasity
Comment
Daniel Son

Join Date: Jun 2015

Posts: 16
#5

30 Nov 2015, 12:47

The dependent variable is indeed a percentage (ratio)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#6

30 Nov 2015, 13:10

You can always do that, but you

0. will be doing something totally ad hoc

1. lose most of the virtues of logarithms

2. stretch the bottom half of the scale much, much more than will make economic or statistical sense

3. run the risk of creating massive negative outliers.

Let's imagine transforming the range -5 to 20, say.

Apart from adding 5, what should the small fudge constant be to move values away from -5? You seem to be thinking of 1e-5. Let's look at that:

Code:

. mata ------------------------------------------------- mata (type end to exit) ---------- : x = (-5, 0, 5, 10, 15, 20) : x', (ln(x :+5 :+ 1e-5))' 1 2 +-------------------------------+ 1 | -5 -11.51292546 | 2 | 0 1.609439912 | 3 | 5 2.302586093 | 4 | 10 2.708050868 | 5 | 15 2.995732774 | 6 | 20 3.218876225 | +-------------------------------+ : x', (ln(x :+5 :+ 1e-3))' 1 2 +-------------------------------+ 1 | -5 -6.907755279 | 2 | 0 1.609637892 | 3 | 5 2.302685088 | 4 | 10 2.708116866 | 5 | 15 2.995782272 | 6 | 20 3.218915824 | +-------------------------------+ : x', (ln(x :+5 :+ 1e-1))' 1 2 +-------------------------------+ 1 | -5 -2.302585093 | 2 | 0 1.62924054 | 3 | 5 2.312535424 | 4 | 10 2.714694744 | 5 | 15 3.000719815 | 6 | 20 3.222867846 | +-------------------------------+

I varied the fudge constant from 1e-5 to 1e-1 = 0.1 but the main problem remains.

You will want to experiment with your own values, but qualitatively I think there is a clear conclusion. That exaggerates the range of the scale for negative rates much, much more than makes sense.

Also, try drawing graphs of the transformation you are implying and you will get a shock.

Cube roots are ad hoc too, but at least they preserve sign and treat values around zero symmetrically.

NB: Let me repeat a gentle hint to help with your papers and presentations. Heteroscedasticity is a difficult word for everybody, but there are only two accepted spellings.

Last edited by Nick Cox; 30 Nov 2015, 13:21.
1 like
Comment
Daniel Son

Join Date: Jun 2015

Posts: 16
#7

30 Nov 2015, 13:50

Hi Nick,

Thanks a lot for your input, really appreciated it. How would the transformation look like using the cube roots transformation? The thing about cube roots is that they might not be the Right transformation for the heteroscedasticity in this model. Next to that, only a small amount of the observations tend to be negative of value.

I was just wondering, if the whole thing of transformering negative values is so difficult, is a serious option just to leave them out of the dataset? Or just only use vce robust errors? In stead of trying the "correct" the data before running the regression
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#8

30 Nov 2015, 14:06

Leaving out values because they are awkward to handle statistically is usually a very bad idea. Negative interest rates are evidently part of your data. They may be exceptional but unless you can make a case that they are incorrect, or irrelevant to your research problem, then they should be included.

We don't have your data to show what cube roots would look like.

I referred in #2 to a paper that explains how to calculate them.

No one can be confident of a right or correct transformation to deal with heteroscedasticity. But far more important is getting a handle on a suitable functional form.

Otherwise put, "robust" standard errors at best just make your standard errors more honest. They don't correct or adjust a badly chosen functional form. It might be transforming your data is a better thing to do. Or perhaps the heteroscedasticity is not that important. We can't tell.

I
Comment
Daniel Son

Join Date: Jun 2015

Posts: 16
#9

30 Nov 2015, 14:09

Well, large differences between my standard robust T statistics and normal T statistics indicate model mispecification which I should deal with. However, I was told that in case of such large differences I should reconsider the model and transform the variables
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#10

01 Dec 2015, 07:30

A positive option (pun intended) given that there are only a few negative values may be to use a generalised linear model with log link. That can accommodate some zero or negative values of the response insofar as the assumption is only that the conditional mean response remains positive.
Comment

Announcement

Transforming negative values in order to solve heteroskedasity

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment