Dealing with wage = 0

Solomon Lin

Join Date: May 2016

Posts: 36
#1

Dealing with wage = 0

04 May 2016, 14:33

Hi,

I am trying to find out the effect of drug use on wages. I will be using Log Wages as my dependent variable. However, some individuals in the sample reported 0 wages. What is the typical way of dealing with this?

Should I:
1) Drop all the individuals who reported 0 wages?
2) Set Log Wage = 0 when wages are 0?

Also, what is the typical way of dealing with wages that are very large? Would you drop these outliers or keep them in?
Tags: None
Dick Campbell

Join Date: Apr 2014

Posts: 279
#2

04 May 2016, 14:37

See the following blog entry by Bill Gould: http://blog.stata.com/2011/08/22/use...tell-a-friend/

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
Comment
Solomon Lin

Join Date: May 2016

Posts: 36
#3

04 May 2016, 17:23

Originally posted by Dick Campbell View Post

See the following blog entry by Bill Gould: http://blog.stata.com/2011/08/22/use...tell-a-friend/

Hi Dick,

I appreciate the reply and I don't mean to ignore your advice but I think that method is a bit too advanced for my purposes. From your experience, is it more appropriate to drop all wages that are 0 or simply set log wage = 0 when wage = 0?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#4

04 May 2016, 17:35

Dick can speak for himself but I don't find either of your solutions attractive. But something people sometimes do is

1. Find the smallest positive wage.

2. In a clone of wage, replace zeros by that value.

3. Now set up an indicator variable: 1 if wage is zero and 0 otherwise. The coefficient on that indicator measures the offset associated with being zero rather than the smallest positive value.

I'd still go for Poisson regression.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#5

05 May 2016, 02:17

Solomon: you haven't told us what a "zero" for wages represents -- is it because of item non-response or because the respondent is not doing paid work in the period the survey asks about? And the prevalence of zeros is likely to make a difference. You have also asked a closely-related question in another thread (http://www.statalist.org/forums/foru...step-estimator) and it is unfortunate to mix things up. (The other thread mentions that you have panel data, and you ask about the Heckman estimator -- see heckman).

If you want to get good estimates of the relationship between drug and wages in the population, and the zeros represent no paid work, then the Heckman selection model is the standard way to proceed. You may need the generalisation of that to a panel data context -- this is something that J. Wooldridge has written about -- check his graduate text book, and also Google appropriately.
You have already been told about the Poission (PPML) approach which is another way of handling the zeros (but has a different conceptualisation of the generation of the zeros).

My reaction to your statement:

I think that method is a bit too advanced for my purposes

is that you should address your data in a serious manner using well-developed methods that are out there. "Non-advanced" methods are probably "wrong" methods in this context.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3860
#6

05 May 2016, 02:30

My reaction to your statement: is that you should address your data in a serious manner using well-developed methods that are out there. "Non-advanced" methods are probably "wrong" methods in this context.

I agree. Also it is hard to see how a Heckman selection model in a panel context qualifies as less "advanced" compared with a poisson model. But this probably depends on experience with those estimators.

Nick's suggestion, which he is obviously not a fan of himself, strikes me as questionable as well. It reminds me of the dummy-variable adjustment method where you plug in the mean of a variable for any missing value, then create an indicator of whether the original variable was missing. This method is known to produce biased estimates. At best, I guess, you get the standard errors wrong because substituting zeros with the smallest value (or any constant) will artificially decrease the variance of wages.

Best
Daniel

Last edited by daniel klein; 05 May 2016, 02:33.
Comment
Dick Campbell

Join Date: Apr 2014

Posts: 279
#7

05 May 2016, 10:07

If you read Gould's piece carefully you will see that he does not advocate using Poisson in the case where one is modeling wages because people with zero earnings are not in the labor force and thus you should model LF participation as part of the analysis. In your original posting you indicated that labor force participation is the reason for the zeros and I should have picked up on that. I don't know your field, but in sociology, political science and related fields any of the "obvious" solutions for zero wages such as dropping cases or logging the non-zero values while leaving zeros at zero, which you mention, or adding a small constant (see Gould's blog post for what's wrong with that) will likely lead to negative reactions from reviewers. As you apparently indicated in a posting elsewhere, which I have nor read, you are aware of Heckman-type models, which are probably the best solution to this type of problem. Applying that method to panel data is a bit beyond my ken. In response to a previous question of this kind (see http://www.stata.com/statalist/archi.../msg00457.html) there were a few suggestions about how to proceed. I grant that this is a rather difficult problem. Unfortunately, there is no simple solution of which I am aware.

Richard T. Campbell
Emeritus Professor of Biostatistics and Sociology
University of Illinois at Chicago
2 likes
Comment

Announcement

Dealing with wage = 0

Comment

Comment

Comment

Comment

Comment

Comment