Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Are wage data with 7% of observations earning '0' censored?

    Hi all

    Trying to understand the nature of my wage data and whether Tobit regression (or some other method suited for censored data) is more appropriate than OLS-regression. The variable 'wage' shows the income of the individuals in my data set. 7% of the individuals have a wage of '0' (because they are unemployed). In this case, is the data censored? (mean is aprx. 420.000, median is aprx. 388.000, skewness=-0.53, kurtosis=5,27).

    This link provides an example of censored data as: "There are a number of customers in a mall (buyers and non-buyers). In censored data, non-buyers value will be counted as zero while buyers cosumption will be observed. In truncated data only buyers data will be in the sample."

    In my understanding, data is censored when the information we have on a variable is unexact above/below some treshold. So an example of a censored wage variable could be a variable containing information on the exact wage of individuals exact individuals who earn less than 25.000 a year where their values would just be '<25.000'. But in my example and the example provided in the link, the information is not unexact, because the individuals in my data are indeed earning '0' because they are unemployed and the customers in the mall are indeed buying '0'. This makes conclude that my data and the data in the example are not censored, but maybe I am getting something wrong here.

    Could anyone explain whether my data and the data in the example is censored?

  • #2
    It would be perfectly reasonable to use a Tobit regression. Consider that some respondents are on the verge of buying, and some would only buy if circumstances changed a great deal. But all of those non-buyers are lumped together at zero. A Tobit regression doesn't minimize the difference from zero for those respondents but instead maximizes the liklihood based on the limited knowledge provided by the upper bound. It would allow the predicted value of the dependent variable to spread into negative values for some respondents.

    Comment


    • #3
      Originally posted by [email protected] View Post
      It would be perfectly reasonable to use a Tobit regression. Consider that some respondents are on the verge of buying, and some would only buy if circumstances changed a great deal. But all of those non-buyers are lumped together at zero. A Tobit regression doesn't minimize the difference from zero for those respondents but instead maximizes the liklihood based on the limited knowledge provided by the upper bound. It would allow the predicted value of the dependent variable to spread into negative values for some respondents.
      So could you make the same case on the wage data, saying that some individuals are on the verge of getting a job whereas other are far away? The effect I am trying to estimate is that of a teaching reform on the job market outcome on students. Then could you say that the data is censored because between the individuals with '0' wage there is a great deal of variation in the skills/valuation of their skills by the job market, even though they have the same wage income (of 0)? So even though the data on their wage is exact, the data on what theoretically is interesting (valuation of their skills) is unexact?

      Comment


      • #4
        With a Tobit model you would argue that the unemployed people got a job offer with a negative wage, and they refused. That is what caused the observed zeros. The Tobit model would report the effects on the latent (including the unobserved negative wages) dependent variable. This is obviously not how the labor market works, but models are supposed to be a simplification of reality and not completely right. However a limitation of this model that you should consider is that unemployment and wage are in this model governed by one and the same process. If your 0s are only unemployed persons, i.e. those who want to work but haven't found a job, then that might work in some labor markets. I would need more convincing when you want to apply the Tobit model in more regulated/distorted (depending on your political leaning) labor markets. If the 0s also include people outside the labor market, e.g. stay-at-home-moms, then the process of getting a job and the wage are related, but separate enough processes so that your model needs to represent that. Typically, a Heckman model is used for that. However, then you need an instrumental variable, and that is a whole other can of worms you don't want to open unless you really really really have to.

        I long time ago I wrote a small paper together with Pamala Wiepking summarizing various options for this type of data, but applied to charitable donations rather than wage. It may give you a brief overview and help make an informed decision. http://www.maartenbuis.nl/wp/sel_don.pdf
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Originally posted by Maarten Buis View Post
          With a Tobit model you would argue that the unemployed people got a job offer with a negative wage, and they refused. That is what caused the observed zeros. The Tobit model would report the effects on the latent (including the unobserved negative wages) dependent variable. This is obviously not how the labor market works, but models are supposed to be a simplification of reality and not completely right. However a limitation of this model that you should consider is that unemployment and wage are in this model governed by one and the same process. If your 0s are only unemployed persons, i.e. those who want to work but haven't found a job, then that might work in some labor markets. I would need more convincing when you want to apply the Tobit model in more regulated/distorted (depending on your political leaning) labor markets. If the 0s also include people outside the labor market, e.g. stay-at-home-moms, then the process of getting a job and the wage are related, but separate enough processes so that your model needs to represent that. Typically, a Heckman model is used for that. However, then you need an instrumental variable, and that is a whole other can of worms you don't want to open unless you really really really have to.

          I long time ago I wrote a small paper together with Pamala Wiepking summarizing various options for this type of data, but applied to charitable donations rather than wage. It may give you a brief overview and help make an informed decision. http://www.maartenbuis.nl/wp/sel_don.pdf
          Ok, thanks for you clarification.

          Comment

          Working...
          X