Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why missing observations when log variable?

    Can someone kindly explain what went wrong? There are 28,885 observations for net worth. I want to log it so gen networth=log(networth). Message says 2,645 missing values generated. Do you lose observations when you log the variable? I had to log it to avoid negative values. it was skewed as is. How to correct the missing variables as a result of log? Thank you.

  • #2
    The logarithm of zero is undefined and so reported as missing. The logarithm of any negative value is a complex number but returned by Stata as missing.

    If you have zero or negative values taking logarithms is not a solution. What is a better solution depends on the details of your situation, so tell us more.

    Comment


    • #3
      More precisely, do
      Code:
      clonevar worth2=networth
      
      replace worth2=ln(worth2)
      
      
      dataex networth worth2 if worth2==.

      Comment


      • #4
        Just to amplify #2. What follows mixes facts and opinions.

        Decisions can and should be different depending on whether a skewed variable is (1) a response or outcome (yet other terms, including dependent variable, which refuses to die) or (2) a predictor or explanatory variable (ditto, independent variable).

        Response or outcome

        An outcome or response variable may invite a model of the form y = exp(Xb) which can be consistent with some zero or negative values for the outcome. Te point is that the functional form refers to the mean function, not (all) the data. Poisson regression or some generalized linear model with a logarithmic link may here be preferable and the question of transformation does not then arise.

        On the other hand, many skewed variables can be positive and negative, or even zero, depending. One simple example is when many firms show profits but some show losses. More generally, many responses are some kind of change or difference and skewness could then mean that most values are positive but some are not (or indeed that most values are negative, but some are not). This situation seems to me to be more common than many texts apply, and there are transformations that may help such as cube root, neglog = sign(y) * ln(1 + abs(y)) or asinh(y). There is a chicken and egg situation here in that many people's reading or experience does not extend to such transformations and so they are (a) reluctant to use them (b) likely to face puzzlement at best from reviewers or examiners if they do. I note that these transformations could be link functions for generalized linear models. Note that even though for example -2 is the cube root of -8 -- as is secondary school mathematics -- you have to arrange for yourself that negative values are cube rooted correctly.

        Predictor or explanatory variable

        Skewness in a predictor as such is neither here nor there. For example an indicator variable that is mostly 1s with few 0s, or vice versa, is skew but it is not even possible to transform it to reduce skewness. I would recommend considering transforming a predictor in some circumstances: if an outlier would otherwise distort a fit and if the relationship looked closer to the model form with a transformation than not. There can be other grounds too. There are set-ups in which all variables are positive and the model consists of power functions multiplied together in which case taking logarithms of everything can be convenient as well as conventional.

        Comment


        • #5
          Thank you for the explanations. I appreciate it. Net worth is a control/predictor variable. As per image, it is skewed to left and there are a few outliers. Since I am included income as a variable, log(income) did not produce any strange output so I decided to omit net worth and just use income.
          Attached Files

          Comment


          • #6
            This variable is right-skewed, not left skewed.

            See e.g. https://www.itl.nist.gov/div898/hand...%20left%20side.

            Comment

            Working...
            X