Survey data coefficients

Robert Kolesar

Join Date: May 2016

Posts: 45
#1

Survey data coefficients

19 Aug 2018, 22:06

Hello,

I am working with survey data on Stata 15. My dependent variable is log transformed health expenditure (I set all $0 values (i.e. no payment) at $0.01). My primary independent variable is a dummy var. I am getting a significant -1.08 coefficient on that var which I interprete as the independent var value of 1 is associated with a 108% decrease in health care expenditure. I am confused as to how expenditure can decrease by more than 100%. My intercept is -8.0, so I am guessing the result is relative to that intercept. Should I be standardizing my coefficients? Any insight is greatly appreciated? Thanks much for any guidance you can provide. Robert
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29968
#2

19 Aug 2018, 22:19

Let y denote expenditures. Let x be your 0/1 "dummy" variable. You have a regression log y = b0 - 1.08*x + residual. So

So if x = 0, we get log y = b0 + residual; if x = 1 for the same entity, we get log y = b0 - 1.08 + residual. So, log y (x=1) - log y (x = 0) = -1.08. Difference of logarithms is the logarithm of the ratio. So log[ y(x=1)/y(x=0)] = -1.08. Exponentiating both sides, we get y(x=1)/y(x=0) = exp(-1.08) = 0.34 (to two decimal places). So you have a 66% decrease, not a 108% decrease.

You went wrong in applying a rule of thumb about coefficients giving percent changes, which is based on an approximation that applies only when the coefficient is sufficiently close to zero, and really only works well when the absolute value of the coefficient is less than 0.1. By the time we get out to a coefficient with magnitude > 1, the approximation is quite bad and the rule of thumb is very misleading.

Added: log-transforming a variable that contains 0 values is hazardous It is especially hazardous if there are a substantial number of observations where the value is 0, which is often the case in health expenditure data. You need to verify that your results are not very sensitive to the choice of $0.01 as your replacement for 0. You might find that had you chosen some other small number to replace 0 you would reach different conclusions. And if that is the case, your findings simply do not stand. So do a good robustness check on that.

A safer approach than log-transforming the outcome variable with an arbitrary small positive number replacing 0 is to leave the variable alone and use a GLM with a log link. (See -help glm- for details of implementation.)

Last edited by Clyde Schechter; 19 Aug 2018, 22:27.
1 like
Comment
Robert Kolesar

Join Date: May 2016

Posts: 45
#3

19 Aug 2018, 23:15

Thanks so much!
Comment

Announcement

Survey data coefficients

Comment

Comment