Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate the Inverse Mill Ratio (IMR) as the Stata does?

    Hello,
    How does Stata calculate the Inverse Mill Ratio (IMR)?

    Here I got different values from the Stata post-estimation option
    mills(lambda)
    and the direct calculation of the ratio.
    Please check the following:
    webuse womenwk,clear reg wage education age married heckman wage education age married, select(married children age education) twostep mills(lambda)
    Here we can get the values of IMR in
    lambda
    .

    Now let's calculate it directly:

    probit married children age education

    predict xx, xb

    gen pdf=normalden(xx)
    gen cdf=normal(xx)
    gen millrat=pdf/cdf
    replace millrat=pdf/(cdf-1) if married==0
    Here, we get IMR in
    millrat
    .

    Now if we look at estimated
    millrat
    and
    lambda
    together using the following command, they look different.
    edit millrat lambda

    Does anyone know what is wrong?

    Best regards,
    Hossein









  • #2
    Don't know except for this comment on the name

    https://www.stata.com/statalist/arch.../msg00433.html

    https://www.statalist.org/forums/for...atio-a-comment #2

    Comment


    • #3
      First using [QUOTE] tags rather than # [CODE] makes this hard to read.

      Also, you selection equation is wrong. It not regressing married on children, age and education but if wages are observed or not.

      Try this:
      Code:
      . webuse womenwk,clear
      
      . qui heckman wage education age married, select(married children age education) twostep mills(lambda)
      
      . gen byte selectiondep =  wage < .
      
      . qui probit select married children age education
      
      . predict xb, xb
      
      . gen mylambda = normalden(xb) / normal(xb)
      
      . l mylambda lambda in 1/5
      
           +----------------------+
           | mylambda      lambda |
           |----------------------|
        1. | 1.282124   1.2821236 |
        2. | .9313837   .93138372 |
        3. | 1.126968   1.1269684 |
        4. | .9079438   .90794379 |
        5. | .5900134   .59001337 |
           +----------------------+

      Comment


      • #4
        Thank you Scott. I really appreciate the clarification.

        Comment


        • #5
          I have a question.when I am dealing with the problem,
          I will do

          gen mylambda = normalden(xb) / normal(xb) if z=1
          replace mylambda = -normalden(xb) / normal(-xb) if z=0

          but when I use the heckman command, and save the lambda ,it seems that lambda= normalden(xb) / normal(xb) for everyone.
          And this problem will associate with the problem of predicted wages for the unemployed.

          I am looking forward for your help

          Comment


          • #6
            Originally posted by yang zx View Post
            I have a question.when I am dealing with the problem,
            I will do

            gen mylambda = normalden(xb) / normal(xb) if z=1
            replace mylambda = -normalden(xb) / normal(-xb) if z=0

            but when I use the heckman command, and save the lambda ,it seems that lambda= normalden(xb) / normal(xb) for everyone.
            And this problem will associate with the problem of predicted wages for the unemployed.

            I am looking forward for your help
            Hi Yang,

            I have the same question regarding IMR. May I ask do you manage to solve it?

            I look forward to your kind reply.

            Comment


            • #7
              The heckman command is intended for when the model y = xb + u is estimated using a selected subset of the sample, where s = 1 (or z = 1) indicates being in the selected sample. For s = 0, y is not observed, and so these are not used at all in the second stage estimation. It is a bit puzzling and confusing that Stata defines lambda to be the same for the s = 0 units. I think it's inconsequential, but it's still confusing. The typical example is when s = 1 indicates being in the labor force and then hourly wage is observed but hourly wage is not observed when s = 0.

              Where it will really become confusing is if you do observe y when s = 0 and you're using heckman to estimate an endogenous switching (treatment effects) model. For example, s = 1 if a person takes a financial fitness class and zero otherwise. y is the amount of annual retirement savings. Then y is observed for s = 1 and s = 0. However, the potential outcomes are not always observed: y = y(0) if s = 0 and y = y(1) if s = 1.

              Which situation are you in?

              Comment


              • #8
                Hi Jeff,

                Thank you. From what you describe above, I think my data falls in the second situation. For example, s=1 if a firm is in financial distress, and zero otherwise. y is the firm's performance, which is observable for both s=1 and s=0. I'm using Heckman test to rule out self-selection bias, e.g. some firms are more likely to be in financial distress than others.

                Comment


                • #9
                  Originally posted by Jeff Wooldridge View Post
                  The heckman command is intended for when the model y = xb + u is estimated using a selected subset of the sample, where s = 1 (or z = 1) indicates being in the selected sample. For s = 0, y is not observed, and so these are not used at all in the second stage estimation. It is a bit puzzling and confusing that Stata defines lambda to be the same for the s = 0 units. I think it's inconsequential, but it's still confusing. The typical example is when s = 1 indicates being in the labor force and then hourly wage is observed but hourly wage is not observed when s = 0.

                  Where it will really become confusing is if you do observe y when s = 0 and you're using heckman to estimate an endogenous switching (treatment effects) model. For example, s = 1 if a person takes a financial fitness class and zero otherwise. y is the amount of annual retirement savings. Then y is observed for s = 1 and s = 0. However, the potential outcomes are not always observed: y = y(0) if s = 0 and y = y(1) if s = 1.

                  Which situation are you in?
                  Hi Jeff, thank you for your comments.
                  I am asking because I want to predict the wages for non-workers after a Heckman estimation.

                  First, I try the following code:
                  use https://www.stata-press.com/data/r16/womenwk,clear
                  heckman wage educ age, select(married children educ age) twostep mills(lambda)
                  Heckman command does provide a way to predict wages for workers.
                  predict hcndwage if wage~=., ycond
                  Then I do it step by step
                  local coee=_b[education]
                  local coea=_b[age]
                  local coec=_b[_cons]
                  local coel=e(lambda)
                  gen predictedwage_observed=`coee'*education+`coea'*age +`coec'*1+`coel'*lambda if wage~=.
                  sum hcndwage predictedwage_observed
                  The results are the same:

                  Variable | Obs Mean Std. Dev. Min Max
                  -------------+---------------------------------------------------------
                  hcndwage | 1,343 23.69217 3.332615 16.22861 33.78897
                  observedwage | 1,343 23.69217 3.332615 16.22861 33.78897

                  However, the Heckman command does not provide a RIGHT way to predict wages for nonworkers.
                  I tend to believe that the predicted wages should be different for workers and nonworkers who have the same observable characteristics.
                  To achieve this, the value of lambda should be different for workers and nonworkers.
                  So I try this one :
                  gen z=wage~=.
                  probit z married children educ age
                  predict xb, xb
                  gen mylambda = normalden(xb) / normal(xb) if z==1
                  replace mylambda = -normalden(xb) / normal(-xb) if z==0
                  gen predictedwage_all=`coee'*education+`coea'*age+`coe c'*1+`coel'*mylambda
                  gen averagewage=`coee'*education+`coea'*age+`coec'*1
                  sum averagewage predictedwage_all
                  The results looks not bad
                  Variable | Obs Mean Std. Dev. Min Max
                  -------------+---------------------------------------------------------
                  averagewage | 2,000 21.26078 3.812659 14.79669 32.88486
                  predictedw_all | 2,000 21.26078 4.733351 8.990657 33.78897

                  However, I am not sure whether this is the right way to get the predicted wages for non-workers and I have not found any details in the literature.

                  Thank you again and looking forward to your reply.
                  Zhengxiong Yang



                  Comment


                  • #10
                    I have different views from #9 for wage prediction. Let's start with the ultimate purpose of a Heckman model. We are interested in estimating a wage equation for everyone, as below. If everyone works, we will be able to identify b, the coefficients of ultimate interest.

                    Code:
                    E(y|x) = xb
                    As we only observe wages of workers, regressing observed wages on x would bias b if the selection into labor markets is endogenous. Therefore, Heckman found a way to correctly uncover b with such a selected sample, as below.

                    Code:
                    E(y|x, z, y is observed) = xb + cλ(zr)
                    E(y|x, z, y is observed) is what we can actually estimate, b is what we want, and λ(zr), the inverse Mill's ratio (IMR), is what we take advantage of to uncover b. I'd like to underscore here that, the role of the IMR is to help us obtain the correct b. Once b is estimated, we will be able to predict wages for everyone, workers or non-workers, based on the original model of interest, and the IMR has nothing to do with this step. We will complete the mission simply by using -predict, xb-, as below.

                    Code:
                    webuse womenwk, clear
                    heckman wage educ age, select(married children educ age) twostep
                    predict wagehat, xb
                    If non-workers have identical observed characteristics to workers, they should have identical predicted wages had they worked. Why do we feel that non-workers are supposed to have different wages from workers? Because non-workers, by construction, have different observed characteristics! The characteristics of non-workers tend to make their potential wages low, and that's why they choose not to work. Using -predict, xb-, we are able to find that the predicted wages of non-workers are lower than workers, where the IMR should not play any role.

                    Code:
                         worker |      mean
                    ------------+----------
                    Non-workers |  19.82792
                        Workers |  21.96174
                    Stata offers options to predict an "expected" wage, where IMR does play roles in accounting for the probability of labor markets participation. But the concept setting wages to zero for non-workers fundamentally contradicts the nature of a Heckman model. What are the wage rates of non-workers? Missings, not zeros! If wages are set to 0 for non-workers, it would at most be a censored sample with corner solutions and a Heckman model (for truncated sample) should not be used at the very beginning -- That's what Jeff implies in #7.

                    Comment

                    Working...
                    X