[Note: PhD student new to Stata and still somewhat of a beginner with stats analysis]
In my dataset, the variable 'Inputs' reflects monetary values for which some observations are 0. I have logged all values of 'Inputs' for running regressions, but of course Stata drops +/- 25 observations for which 'Inputs' =0. I would prefer not to lose those observations because my sample is only n=147.
On the advice of my supervisor, I have replaced 'Inputs'=0 with 'Inputs'=1 for the latter observations so as not to drop them from the sample, then I logged the values again. Now instead of dropping those observations, they remain in the sample with 'Log_Inputs'=0. However, this weakens the R-squared value and therefore the model.
Which is the better choice: Drop the observations that cannot be logged, or weaken the model but maintain the sample size?
In my dataset, the variable 'Inputs' reflects monetary values for which some observations are 0. I have logged all values of 'Inputs' for running regressions, but of course Stata drops +/- 25 observations for which 'Inputs' =0. I would prefer not to lose those observations because my sample is only n=147.
On the advice of my supervisor, I have replaced 'Inputs'=0 with 'Inputs'=1 for the latter observations so as not to drop them from the sample, then I logged the values again. Now instead of dropping those observations, they remain in the sample with 'Log_Inputs'=0. However, this weakens the R-squared value and therefore the model.
Which is the better choice: Drop the observations that cannot be logged, or weaken the model but maintain the sample size?
Comment