Two-stage Heckman correction using two samples - Data Frames

Konstantina Boutsioukou

Join Date: Mar 2022

Posts: 25
#1

Two-stage Heckman correction using two samples - Data Frames

12 Dec 2022, 09:02

Dear all,

I want to carry out a two-stage Heckman correction using two samples. I have one dataset that gives me the selection to higher education and a second one in which I can only observe observation i (admission to an elite university department) if y*_i>0 (i.e if someone is admitted to higher education). I observe the same variables in both datasets. The only thing I do not observe in the second sample is selection (admission to higher education), nor do I observe the outcome of interest in the first dataset (admission to an elite university department). I want to estimate the selection equation with the first dataset and use the parameter estimates to construct the inverse mills ratio using the second dataset. I am using data frames to store both datasets in memory and i am running the following code:

**Step 1: linear predictions from selection equation:

probit HIGHEREDU i.yeard i.edu_f#i.yeard i.occup_f i.sex_stud i.nationality unemp_1, baselevels
frame change admissions
predict p1_hat, xb // Calculate predicted value from regression

** Compute Inverse Mills ratio
generate phi_1=normalden(p1_hat)
generate PHI_1=normal(p1_hat)
generate lambda_1=phi/PHI

The "predict p1_hat, xb" command generates only missing values as i think that the estimated parameters are lost when i change frames. Is there any way to store the estimated parameters from the first frame and then use them to construct the inverse mills ratio in the second dataset/frame?

Any help would be very valuable.

Best regards,
Konstantina
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

12 Dec 2022, 18:59

What stops you from merging the data and using the maximum likelihood estimator? In this way, you have to go through the trouble of correcting the standard errors as a result of the two-stage estimation. In any case, the estimates are available and can be used across frames, so you need to provide a data example that reproduces what you observe. Also note that you need an exclusion restriction(s) when implementing the Heckman procedure, i.e., a variable(s) that predicts selection but does not predict the outcome. For example, in the empirical model of a woman's labor supply, such variables include the number of kids the woman has and whether she is married. These predict participation in the labor force but do not predict the outcome, wage. Therefore, you cannot have the same variables in both equations. If you do, the model is not identified.

Last edited by Andrew Musau; 12 Dec 2022, 19:29.
Comment
Konstantina Boutsioukou

Join Date: Mar 2022

Posts: 25
#3

13 Dec 2022, 08:13

Dear Andrew,

Thank you very much for all your suggestions! I will try the merging now.
Best,
Konstantina
Comment
Chiara Tasselli

Join Date: Feb 2021

Posts: 111
#4

27 Jan 2023, 01:03

Dear Andrew Musau and Konstantina Boutsioukou,
I am facing a similar problem and don't know how to get out of it.
I have two datasets:
i) one with administrative data (from firms) in which I have log(wages) as dependent variables (and the goal is to estimate the gender pay gap)
ii) another (labour force survey) in which I would like to estimate the probability of women to work

My main issue is that the databases are not homogeneous (for example I am really interested in the number of children and marital status but those aren't in the administrative data in which I wanna estimate the gender gap).
So far, what I did was to homologate common variables such as age education and seniority , but I don't know what could I use as selection variables. there is a way to exploit the familiar conditions even if they are not in my principal DB?

Many thanks in advance for your time.
Comment

Announcement

Two-stage Heckman correction using two samples - Data Frames

Comment

Comment

Comment