Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Log

    Dear all,

    If I have a FDI variable, which has large negative values, many zeros and large positive values, how should I log transform this variable? I know it can be solved with a constant, but then I have to add a very large constant.

    I want to try to use OLS regression, that's why I need the log transformation to solve for heteroskedasticity.

  • #2
    Kate: welcome to the Forum. Please re-read the Forum FAQ regarding the ways to maximize the chances of getting helpful replies.

    You haven't explained whether you wish to this "FDI" variable as an explanatory variable or an outcome variable. Let's suppose that it is the latter. You should search the internet, and Stata resources, first -- there is a lot of material regarding this sort of problem. ["Net worth" (= total assets - debt) is a variable with a similar range, for instance] Taking logs of a constant-transformed variable is usually the wrong way to go. One approach is to use a transformation other than the log such as the inverse hyperbolic sine (but with some similar properties). See e.g. Burbidge, John B., Lonnie Magee, and A. Leslie Robb, “Alternative Transformations to Handle Extreme Values of the Dependent Variable," Journal of the American Statistical Association, 83(401), March 1988, 123-7. A sort-of related discussion concerns the use of "Poisson pseudo-maximum likelihood" (PPML), I think, for which there has been discussion in this Forum (search!) and on the Stata Blog page.

    Comment


    • #3
      Stephen clearly gives excellent advice. Outcome or predictor is the first question to answer.

      I'd underline that you need to think about the economic implications here too. That is easy for me to say; I don't have to know what they are.

      Will -100, -1 billion (say) have the same effect or meaning as 100, 1 billion but just with change of sign? If so, log(value + constant) is the wrong way to go. There are many objections to the latter, including where does the constant come from?

      By an engaging coincidence, I have just been posting on asinh over at http://stats.stackexchange.com/quest...rmed-variables The graph there was drawn in Stata, naturally.

      I'll add a toot for cube roots as the simplest function that treats negative, zero and positive values in the way that is often wanted.

      Comment


      • #4
        Thank you both for your answers.

        FDI is my dependent variable. I've searched the internet en some papers use a constant that is big enough to be able to take log of negative values. But I indeed wonder if it will give other unwanted implications.

        I have indeed thought about PPML but I was advised to use the OLS so I am trying to figure out how to transform my data in a reliable and correct way.

        Comment


        • #5
          Dear Kate,

          As usual, Stephen and Nick provide excellent advice; I fully agree that adding a constant and taking logs is totally inappropriate (although very popular!).

          PPML (Poisson pseudo maximum likelihood) would also not be appropriate here because the conditional mean may be negative.

          Besides using one of the more suitable transformations mentioned above, you may also want to consider a multi-part model. For instance, you can consider first the choice between negative, positive, or zero, and then model the (absolute value) of the corresponding flow. For this second part you may use PPML without any problem. Notice that I am not an expert on FDI and therefore I do not know if such a model would be in line with the current theory on the subject.

          Joao

          Comment


          • #6
            Dear Prof Santos Silva,

            Thank you for your explanation.

            Comment

            Working...
            X