Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Transforming negative values in order to solve heteroskedasity

    Hi all,

    Just a question regarding heteroskedasity. I am using the level of interest rates, which appear to be negative for some countries during my sample period. The problem is that my dependent variable and this specfic variable have high levels of heteroskedasity. I was wondering how I can solve this issue by transforming the variables. Single transformation of the dependent variable does not help, therefore I also need to transform my indepdent variable.

    Further, I am using a fixed effects model and am aware of the fact that vce(robust) will also deal with heteroskedasity. However, the standard robust errors and the normal errors show large differences where for I want to control the heteroskedasity in order to prevent model misspecification.

    I can not use another variable which only has positive values. I was wondering if it is possible to rescale my variable by adding a constant to all variables of at least the minimum value+0.00001 (x+costant).

    Thanks a lot,

    Daniel

  • #2
    Heteroscedasticity (heteroskedasticity if you like) is a matter of conditional distribution of the response: it is unaffected by the labelling or values of the predictors.

    I don't understand the rationale for your proposed transformation minimum value + 0.00001 (x + constant).

    It looks textbook linear to me; so will change nothing fundamental.

    Cube roots have been suggested as a transformation that can work well on variables that take on both negative and positive values. In a Stata context see http://www.stata-journal.com/sjpdf.h...iclenum=st0223

    For another discussion see http://stats.stackexchange.com/quest...ptokurtic-data

    Comment


    • #3
      If the matter is (as it seems to be) mainly transformation, Daniel may perhaps want to take a look on this thread: http://www.statalist.org/forums/foru...interpretation. By the way , if the dependent variable is a percentage, for instance, there is also some information on the same thread.
      Best regards,

      Marcos

      Comment


      • #4
        Thank you for your answer.

        By changing the values (x+constant) I rescale the values and no negative values are left. Therefore, I can transform the data using its natural logarithm in order to deal with heteroskedasity

        Comment


        • #5
          The dependent variable is indeed a percentage (ratio)

          Comment


          • #6
            You can always do that, but you

            0. will be doing something totally ad hoc

            1. lose most of the virtues of logarithms

            2. stretch the bottom half of the scale much, much more than will make economic or statistical sense

            3. run the risk of creating massive negative outliers.

            Let's imagine transforming the range -5 to 20, say.

            Apart from adding 5, what should the small fudge constant be to move values away from -5? You seem to be thinking of 1e-5. Let's look at that:

            Code:
            . mata
            ------------------------------------------------- mata (type end to exit) ----------
            : x = (-5, 0, 5, 10, 15, 20)
            
            : x', (ln(x :+5 :+ 1e-5))'
                              1              2
                +-------------------------------+
              1 |            -5   -11.51292546  |
              2 |             0    1.609439912  |
              3 |             5    2.302586093  |
              4 |            10    2.708050868  |
              5 |            15    2.995732774  |
              6 |            20    3.218876225  |
                +-------------------------------+
            
            : x', (ln(x :+5 :+ 1e-3))'
                              1              2
                +-------------------------------+
              1 |            -5   -6.907755279  |
              2 |             0    1.609637892  |
              3 |             5    2.302685088  |
              4 |            10    2.708116866  |
              5 |            15    2.995782272  |
              6 |            20    3.218915824  |
                +-------------------------------+
            
            : x', (ln(x :+5 :+ 1e-1))'
                              1              2
                +-------------------------------+
              1 |            -5   -2.302585093  |
              2 |             0     1.62924054  |
              3 |             5    2.312535424  |
              4 |            10    2.714694744  |
              5 |            15    3.000719815  |
              6 |            20    3.222867846  |
                +-------------------------------+
            I varied the fudge constant from 1e-5 to 1e-1 = 0.1 but the main problem remains.

            You will want to experiment with your own values, but qualitatively I think there is a clear conclusion. That exaggerates the range of the scale for negative rates much, much more than makes sense.

            Also, try drawing graphs of the transformation you are implying and you will get a shock.

            Cube roots are ad hoc too, but at least they preserve sign and treat values around zero symmetrically.

            NB: Let me repeat a gentle hint to help with your papers and presentations. Heteroscedasticity is a difficult word for everybody, but there are only two accepted spellings.

            Last edited by Nick Cox; 30 Nov 2015, 13:21.

            Comment


            • #7
              Hi Nick,

              Thanks a lot for your input, really appreciated it. How would the transformation look like using the cube roots transformation? The thing about cube roots is that they might not be the Right transformation for the heteroscedasticity in this model. Next to that, only a small amount of the observations tend to be negative of value.

              I was just wondering, if the whole thing of transformering negative values is so difficult, is a serious option just to leave them out of the dataset? Or just only use vce robust errors? In stead of trying the "correct" the data before running the regression

              Comment


              • #8
                Leaving out values because they are awkward to handle statistically is usually a very bad idea. Negative interest rates are evidently part of your data. They may be exceptional but unless you can make a case that they are incorrect, or irrelevant to your research problem, then they should be included.

                We don't have your data to show what cube roots would look like.

                I referred in #2 to a paper that explains how to calculate them.

                No one can be confident of a right or correct transformation to deal with heteroscedasticity. But far more important is getting a handle on a suitable functional form.

                Otherwise put, "robust" standard errors at best just make your standard errors more honest. They don't correct or adjust a badly chosen functional form. It might be transforming your data is a better thing to do. Or perhaps the heteroscedasticity is not that important. We can't tell.



                I

                Comment


                • #9
                  Well, large differences between my standard robust T statistics and normal T statistics indicate model mispecification which I should deal with. However, I was told that in case of such large differences I should reconsider the model and transform the variables

                  Comment


                  • #10
                    A positive option (pun intended) given that there are only a few negative values may be to use a generalised linear model with log link. That can accommodate some zero or negative values of the response insofar as the assumption is only that the conditional mean response remains positive.

                    Comment

                    Working...
                    X