How to calculate the Inverse Mill Ratio (IMR) as the Stata does?

Hossein Kavand

Join Date: Jan 2015

Posts: 26
#1

How to calculate the Inverse Mill Ratio (IMR) as the Stata does?

13 Jul 2018, 15:44

Hello,
How does Stata calculate the Inverse Mill Ratio (IMR)?

Here I got different values from the Stata post-estimation option

mills(lambda)

and the direct calculation of the ratio.
Please check the following:

webuse womenwk,clear reg wage education age married heckman wage education age married, select(married children age education) twostep mills(lambda)

Here we can get the values of IMR in

lambda

.

Now let's calculate it directly:

probit married children age education

predict xx, xb

gen pdf=normalden(xx)
gen cdf=normal(xx)
gen millrat=pdf/cdf
replace millrat=pdf/(cdf-1) if married==0

Here, we get IMR in

millrat

.

Now if we look at estimated

millrat

and

lambda

together using the following command, they look different.

edit millrat lambda

Does anyone know what is wrong?

Best regards,
Hossein
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35709
#2

13 Jul 2018, 19:18

Don't know except for this comment on the name

https://www.stata.com/statalist/arch.../msg00433.html

https://www.statalist.org/forums/for...atio-a-comment #2
Comment

Scott Merryman

Join Date: Mar 2014
Posts: 895

13 Jul 2018, 19:35

First using [QUOTE] tags rather than # [CODE] makes this hard to read.

Also, you selection equation is wrong. It not regressing married on children, age and education but if wages are observed or not.

Try this:

Code:

. webuse womenwk,clear

. qui heckman wage education age married, select(married children age education) twostep mills(lambda)

. gen byte selectiondep =  wage < .

. qui probit select married children age education

. predict xb, xb

. gen mylambda = normalden(xb) / normal(xb)

. l mylambda lambda in 1/5

     +----------------------+
     | mylambda      lambda |
     |----------------------|
  1. | 1.282124   1.2821236 |
  2. | .9313837   .93138372 |
  3. | 1.126968   1.1269684 |
  4. | .9079438   .90794379 |
  5. | .5900134   .59001337 |
     +----------------------+

Comment

Hossein Kavand

Join Date: Jan 2015

Posts: 26
#4

13 Jul 2018, 21:21

Thank you Scott. I really appreciate the clarification.
Comment
yang zx

Join Date: Apr 2018

Posts: 5
#5

23 Jul 2018, 23:24

I have a question.when I am dealing with the problem,
I will do

gen mylambda = normalden(xb) / normal(xb) if z=1
replace mylambda = -normalden(xb) / normal(-xb) if z=0

but when I use the heckman command, and save the lambda ,it seems that lambda= normalden(xb) / normal(xb) for everyone.
And this problem will associate with the problem of predicted wages for the unemployed.

I am looking forward for your help
1 like
Comment
Mengqian Chen

Join Date: Feb 2020

Posts: 20
#6

11 Jan 2021, 08:02

Originally posted by yang zx View Post

I have a question.when I am dealing with the problem,
I will do

gen mylambda = normalden(xb) / normal(xb) if z=1
replace mylambda = -normalden(xb) / normal(-xb) if z=0

but when I use the heckman command, and save the lambda ,it seems that lambda= normalden(xb) / normal(xb) for everyone.
And this problem will associate with the problem of predicted wages for the unemployed.

I am looking forward for your help

Hi Yang,

I have the same question regarding IMR. May I ask do you manage to solve it?

I look forward to your kind reply.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2169
#7

11 Jan 2021, 10:46

The heckman command is intended for when the model y = xb + u is estimated using a selected subset of the sample, where s = 1 (or z = 1) indicates being in the selected sample. For s = 0, y is not observed, and so these are not used at all in the second stage estimation. It is a bit puzzling and confusing that Stata defines lambda to be the same for the s = 0 units. I think it's inconsequential, but it's still confusing. The typical example is when s = 1 indicates being in the labor force and then hourly wage is observed but hourly wage is not observed when s = 0.

Where it will really become confusing is if you do observe y when s = 0 and you're using heckman to estimate an endogenous switching (treatment effects) model. For example, s = 1 if a person takes a financial fitness class and zero otherwise. y is the amount of annual retirement savings. Then y is observed for s = 1 and s = 0. However, the potential outcomes are not always observed: y = y(0) if s = 0 and y = y(1) if s = 1.

Which situation are you in?
Comment
Mengqian Chen

Join Date: Feb 2020

Posts: 20
#8

20 Jan 2021, 04:53

Hi Jeff,

Thank you. From what you describe above, I think my data falls in the second situation. For example, s=1 if a firm is in financial distress, and zero otherwise. y is the firm's performance, which is observable for both s=1 and s=0. I'm using Heckman test to rule out self-selection bias, e.g. some firms are more likely to be in financial distress than others.
Comment
yang zx

Join Date: Apr 2018

Posts: 5
#9

25 Nov 2021, 07:11

Originally posted by Jeff Wooldridge View Post

The heckman command is intended for when the model y = xb + u is estimated using a selected subset of the sample, where s = 1 (or z = 1) indicates being in the selected sample. For s = 0, y is not observed, and so these are not used at all in the second stage estimation. It is a bit puzzling and confusing that Stata defines lambda to be the same for the s = 0 units. I think it's inconsequential, but it's still confusing. The typical example is when s = 1 indicates being in the labor force and then hourly wage is observed but hourly wage is not observed when s = 0.

Where it will really become confusing is if you do observe y when s = 0 and you're using heckman to estimate an endogenous switching (treatment effects) model. For example, s = 1 if a person takes a financial fitness class and zero otherwise. y is the amount of annual retirement savings. Then y is observed for s = 1 and s = 0. However, the potential outcomes are not always observed: y = y(0) if s = 0 and y = y(1) if s = 1.

Which situation are you in?

Hi Jeff, thank you for your comments.
I am asking because I want to predict the wages for non-workers after a Heckman estimation.

First, I try the following code:
use https://www.stata-press.com/data/r16/womenwk,clear
heckman wage educ age, select(married children educ age) twostep mills(lambda)
Heckman command does provide a way to predict wages for workers.
predict hcndwage if wage~=., ycond
Then I do it step by step
local coee=_b[education]
local coea=_b[age]
local coec=_b[_cons]
local coel=e(lambda)
gen predictedwage_observed=`coee'*education+`coea'*age +`coec'*1+`coel'*lambda if wage~=.
sum hcndwage predictedwage_observed
The results are the same:

Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
hcndwage | 1,343 23.69217 3.332615 16.22861 33.78897
observedwage | 1,343 23.69217 3.332615 16.22861 33.78897

However, the Heckman command does not provide a RIGHT way to predict wages for nonworkers.
I tend to believe that the predicted wages should be different for workers and nonworkers who have the same observable characteristics.
To achieve this, the value of lambda should be different for workers and nonworkers.
So I try this one :
gen z=wage~=.
probit z married children educ age
predict xb, xb
gen mylambda = normalden(xb) / normal(xb) if z==1
replace mylambda = -normalden(xb) / normal(-xb) if z==0
gen predictedwage_all=`coee'*education+`coea'*age+`coe c'*1+`coel'*mylambda
gen averagewage=`coee'*education+`coea'*age+`coec'*1
sum averagewage predictedwage_all
The results looks not bad
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
averagewage | 2,000 21.26078 3.812659 14.79669 32.88486
predictedw_all | 2,000 21.26078 4.733351 8.990657 33.78897

However, I am not sure whether this is the right way to get the predicted wages for non-workers and I have not found any details in the literature.

Thank you again and looking forward to your reply.
Zhengxiong Yang
Comment
Fei Wang

Join Date: Oct 2021

Posts: 726
#10

25 Nov 2021, 09:52

I have different views from #9 for wage prediction. Let's start with the ultimate purpose of a Heckman model. We are interested in estimating a wage equation for everyone, as below. If everyone works, we will be able to identify b, the coefficients of ultimate interest.

Code:

E(y|x) = xb

As we only observe wages of workers, regressing observed wages on x would bias b if the selection into labor markets is endogenous. Therefore, Heckman found a way to correctly uncover b with such a selected sample, as below.

Code:

E(y|x, z, y is observed) = xb + cλ(zr)

E(y|x, z, y is observed) is what we can actually estimate, b is what we want, and λ(zr), the inverse Mill's ratio (IMR), is what we take advantage of to uncover b. I'd like to underscore here that, the role of the IMR is to help us obtain the correct b. Once b is estimated, we will be able to predict wages for everyone, workers or non-workers, based on the original model of interest, and the IMR has nothing to do with this step. We will complete the mission simply by using -predict, xb-, as below.

Code:

webuse womenwk, clear heckman wage educ age, select(married children educ age) twostep predict wagehat, xb

If non-workers have identical observed characteristics to workers, they should have identical predicted wages had they worked. Why do we feel that non-workers are supposed to have different wages from workers? Because non-workers, by construction, have different observed characteristics! The characteristics of non-workers tend to make their potential wages low, and that's why they choose not to work. Using -predict, xb-, we are able to find that the predicted wages of non-workers are lower than workers, where the IMR should not play any role.

Code:

worker | mean ------------+---------- Non-workers | 19.82792 Workers | 21.96174

Stata offers options to predict an "expected" wage, where IMR does play roles in accounting for the probability of labor markets participation. But the concept setting wages to zero for non-workers fundamentally contradicts the nature of a Heckman model. What are the wage rates of non-workers? Missings, not zeros! If wages are set to 0 for non-workers, it would at most be a censored sample with corner solutions and a Heckman model (for truncated sample) should not be used at the very beginning -- That's what Jeff implies in #7.
Comment

Announcement

How to calculate the Inverse Mill Ratio (IMR) as the Stata does?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment