Logged variables w/value of 0 : drop observations or weaken the model?

Amy Thom

Join Date: Jun 2021

Posts: 15
#1

Logged variables w/value of 0 : drop observations or weaken the model?

07 Jul 2021, 05:58

[Note: PhD student new to Stata and still somewhat of a beginner with stats analysis]

In my dataset, the variable 'Inputs' reflects monetary values for which some observations are 0. I have logged all values of 'Inputs' for running regressions, but of course Stata drops +/- 25 observations for which 'Inputs' =0. I would prefer not to lose those observations because my sample is only n=147.

On the advice of my supervisor, I have replaced 'Inputs'=0 with 'Inputs'=1 for the latter observations so as not to drop them from the sample, then I logged the values again. Now instead of dropping those observations, they remain in the sample with 'Log_Inputs'=0. However, this weakens the R-squared value and therefore the model.

Which is the better choice: Drop the observations that cannot be logged, or weaken the model but maintain the sample size?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

07 Jul 2021, 06:50

Amy:
welcome to this forum.
I would avoid adding an (arbitray) additive constant and maintain the sample as it originally was (n=147).
The issue is that you're forced to go log-linear for tribal traditions or for some other reason.
That said, in your future post please share what you typed and your Stata gave you back (as per FAQ). Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#3

07 Jul 2021, 07:12

Taking logarithms of (variable + constant) is one work-around.

Let's assume the minimum value other than zeros is 1 unit. Then use as predictors

cond(x == 0, log(x + 1), log(x))

the indicator 1 if x == 0 and 0 otherwise.

The first is a fudge, but the second allows some quantification of the effect of the predictor being 0 not 1.

Stata drops nothing here; better to say that it omits missing values from model fits.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#4

07 Jul 2021, 08:00

The choice of what constant to add can have substantial effects on your parameter estimates. I'd try Nick's suggestion, but see what happens when you used some different values of the constant (0.01, 0.1, 1.0, etc as relevant in your situation). Perhaps you will find that that the choice of constant doesn't matter much, and that the results are similar to when those observations are omitted due to missing values, which would support that approach. However, in confronting a similar situation myself, I once found that the value of the constant *did* matter. I proceeded to try something in the direction Carlo implied, i.e., trying another transformation instead of log(). I used sqrt(), and obtained a better fit than I did with log(), so I think that's also worth a try in your situation.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35692
#5

07 Jul 2021, 09:44

Note that log(x + smidgen) for 0 < smidgen << 1 is likely to create massive outliers. My suggestion in #3 pivots on 1 being the smallest feasible positive value but may be generalised accordingly.
Comment
Justin Niakamal

Join Date: Aug 2017

Posts: 760
#6

07 Jul 2021, 10:10

There’s a recent paper on this topic (dealing with logs and zeros in regression models), which you might find useful. The authors discuss some of the common practices and have also made the Stata implementation of their approach to address the issue publicly available on GitHub.

https://github.com/ldpape/iOLS

Bellego, Christophe, and Louis-Daniel Pape. "Dealing with logs and zeros in regression models." Série des Documents de Travail 2019-13 (2019).
Comment
Amy Thom

Join Date: Jun 2021

Posts: 15
#7

07 Jul 2021, 11:45

Thanks so much for all of your input. I will give it a go with Nick Cox's suggestion and play around with Mike Lacy's advice. And I will absolutely review that paper Justin Niakamal!
Comment

Announcement

Logged variables w/value of 0 : drop observations or weaken the model?

Comment

Comment

Comment

Comment

Comment

Comment