Heckman procedure (control function) when the treatment dummy is endogenous

Ridwan Sheikh

Join Date: Apr 2021

Posts: 168
#1

Heckman procedure (control function) when the treatment dummy is endogenous

18 Feb 2023, 11:45

Dear Nick Cox Jeff Wooldridge and Clyde Schechter .
I have a query regarding dummy endogenous variable model (Heckman, 1978) and beautifully explained in Jeff Wooldridge Econometrics analysis of cross-section and panel data, 2010 (p. 622, ch.18)

Suppose we are trying to estimate the ATE (or ATT) of being a "union member" (treatment) on "wages" (outcome) of women. Further, let the treatment assignment is endogenous.
Therefore our baseline outcome equation is:

wages = a + b*X + c*union + e_i ----------------------------(outcome equation)

The decision to obtain the union membership is endogenous, and we are interested in obtaining consistent/unbiased estimates of the parameter (c) in such a case. The Heckman procedure provides the consistent estimates of treatment effect in such a scenario. I briefly explain the procedure below-

The binary treatment variable union is assumed to stem from the latent variable:

union* = W_i *gamma+ u_i

The decision to obtain a treatment is made according to the rule:

Code:

union = { 1, if union* > 0 { 0, otherwise

We will use the variables wage (wage), grade (years of schooling completed), smsa (an indicator for living in
an SMSA—standard metropolitan statistical area), black (an indicator for being African-American),
tenure (tenure at current job), and south (an indicator for living in the South). The variable union is a dummy (=1) for being a union member and zero otherwise.

Stage-1: The Heckman procedure runs a probit regression of the treatment equation on vector W_i :

P(union=1|W) = Phi(W_i *gamma) --------------------------- (treatment equation)

where W_i is a vector of variable that determine the treatment assignment.

where Phi isa standard normal density function

we calculate the hazard rate (h_i) for each observation in stage-1 such that:

Code:

h_i = {phi(.)/Phi(.) ; union=1 { -phi(.)/1-Phi(.) ; union=0

Stage-2: We augment the outcome regression with the hazard (h_i) and the additional parameters (d1 and d2) are obtained on the interaction of [(union*h_i)] and [(1-union)*h_i], respectively. Therefore, we estimate

E(wages| X, union;W) = a + b*X + c*union + d1[union*(phi/Phi)] + d2[(1-union)*(-phi/1-Phi)] --------------(augmented model)

I run the following codes to obtain the parameter of interest (c). The effect of treatment is not allowed to vary at different levels of covariates, therefore we run the following:
(In this case ATE = ATET)

Code:

use https://www.stata-press.com/data/r17/union3, clear etregress wage age grade smsa black tenure, treat(union = south black tenure) hazard(h) first

hazard(h) will create a new variable (h) containing the hazard from the treatment equation. The option first specifies that the first-step probit estimates of the treatment equation are also displayed.

If we want the effect of treatment to vary at different levels of covariates, we run the following:
(In this case ATE!= ATET)

Code:

use https://www.stata-press.com/data/r17/union3, clear etregress wage age grade smsa i.union#c.(black tenure), treat(union = south black tenure) hazard(h) first vce(robust)

we use margins command to calculate ATE and ATET respectively,

Code:

margins r.union, vce(unconditional) contrast(nowald) // ATE margins r.union, vce(unconditional) contrast(nowald) subpop(union) // ATET

By running these codes, I get coefficient (c) on the treatment dummy (union ) along with other covariates. The codes generate the hazard (h_i) variable for each observation. But I did not obtain the coefficients d1 and d2, respectively, from the augmented outcome model. How should I tweak the code to obtain ATE and ATET along with coefficients d1 and d2. Please get back to me, I shall be very thankful.

Regards,
(Ridwan)

Last edited by Ridwan Sheikh; 18 Feb 2023, 11:56.
Tags: Endogenous Dummy, hazard rate, heckman, treatment effects model
Ridwan Sheikh

Join Date: Apr 2021

Posts: 168
#2

19 Feb 2023, 01:51

Dear Charles Lindsey (StataCorp), I have seen your response to a similar query on the form (https://www.statalist.org/forums/for...regress-output).
Can you please help me with the question I have posted above:

By running these codes, I get coefficient (c) on the treatment dummy (union ) along with other covariates. The codes generate the hazard (h_i) variable for each observation. But I did not obtain the coefficients d1 and d2, respectively, from the augmented outcome model. How should I tweak the code to obtain ATE and ATET along with coefficients d1 and d2. Please get back to me; I shall be very thankful.

regards,
(Ridwan)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#3

19 Feb 2023, 04:39

Hi Ridwan
that was a quite detailed information but not sure what the question is.
for instance , couldn’t you estimate the model by hand and get exactly what you want?
standard errors can be obtained either by bootstrap or following the directions in the manual ( which are usually quite detailed)
f
1 like
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 168
#4

19 Feb 2023, 08:43

Thanks, FernandoRios for getting back to me. I agree I have been overly elaborative.
Let me make it a little brief from the example above.
We try to estimate the model:

wages = a + b*X + c*union + e_i -------------------------------- (outcome equation)

Where selection on being a union member is endogenous. Therefore, the Heckman (two-step) procedure augments the outcome equation with the hazard (h_i), i.e.,

wages = a + b*X + c*union + d1[union*(phi(.)/Phi(.))] + d2[(1-union)*(-phi(.)/1-Phi(.))] --------------(augmented model)

The hazard (h_i) is obtained from the first-stage probit in the treatment equation(please see the detailed description above), such that:

Code:

h_i = {phi(.)/Phi(.); if union=1 {phi(.)/Phi(.); if union=0

Apart from the treatment effect (c), I am interested in getting the estimates of d1 and d2.

When I run the following codes

Code:

use https://www.stata-press.com/data/r17/union3, clear etregress wage age grade smsa i.union#c.(black tenure), treat(union = south black tenure) hazard(h) first vce(robust)

I can obtain the treatment effects (ATE and ATT) using margins command. But I cannot obtain the additional parameters containing the hazard (i.e., parameters d1 and d2 in the augmented model).

However, if I run the following codes

Code:

use https://www.stata-press.com/data/r17/union3, clear etregress wage age grade smsa i.union#c.(black tenure), treat(union = south black tenure) first vce(robust) poutcomes

By specifying poutcomes in the code, I can now obtain lambda1 and lambda0, such that:

lambda1 = rho1*sigma1 and lambda0 = rho0*sigma0

I am not sure whether lambda1 can be interpreted as d1 and lambda0 as d2 in the augmented model.

I hope I am making it clear now.

The STATA command I am using is etregress. This is approachis explained in STATA manual (pp.no 89-121)https://www.stata.com/manuals/te.pdf

Thanks,
(Ridwan)

Last edited by Ridwan Sheikh; 19 Feb 2023, 08:45.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#5

19 Feb 2023, 09:13

Right
I think you need to either search for the two step option, or compare it to heckman ml vs heckman two step
they use , it seems, the same nomenclature
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 168
#6

19 Feb 2023, 11:56

Thanks, FernandoRios . I have read a little more about the estimation that etregress can accommodate under different settings. I am briefly explaining that to make my question more clear.

Heckman procedure (estimated using etregress) can fit two separate regression to calculate treatment effect parameter.

1) Constrained Model

This model assumes that variance and correlation parameters are identical across treatment and control group. The estimation can be done using either two-step estimator or default ML estimator.
[we assume that the effect of treatment varies across covariates (black and tenure)]
Therefore, if we run

Code:

etregress wage age grade smsa i.union#c.(black tenure), treat(union = south black tenure) hazard(h) first vce(robust)

This gives the default ML estimates under constrained model.

Alternatively, if we specify twostep in the codes, we obtain results based on two-step estimator (Madalla,1983), that is

Code:

etregress wage age grade smsa i.union#c.(black tenure), treat(union = south black tenure) hazard(h) first twostep

Note: The two-step estimator is already robust, so we do not specify vce(robust)

2) Potential outcome Model (or Unconstrained model)

This model assumes the separate variance and correlation parameters for treatment and control group. To estimate the separate variance and correlation parameters, we specify poutcome in the code. Also, we specify cfunction option to use the control function estimator.
[we assume that the effect of treatment varies across covariates (black and tenure) as in constrained case]

Therefore in this case we run:

Code:

etregress wage age grade smsa i.union#c.(black tenure), treat(union = south black tenure) hazard(h) first poutcomes cfunction

Note: Because we are using a CF estimator, the variance–covariance of the estimator (VCE) is already robust, so we do not specify vce(robust)

I am posing the same question:-

By specifying poutcomes and cfunction in the code under Potential outcome Model (or Unconstrained model), I can now obtain lambda1 and lambda0, such that:

lambda1 = rho1*sigma1 and lambda0 = rho0*sigma0

I am not sure whether lambda1 can be interpreted as d1 and lambda0 as d2 in the augmented model.

Sorry for being a little long in my question, and thanks for reading it.

Thanks and regards,
(Ridwan)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#7

19 Feb 2023, 14:32

Short answer yes
just derive it for the simpler heckman selection case, and bring together the estimation via two stages and mle
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 168
#8

20 Feb 2023, 03:09

Dear FernandoRios ! I think the command heckman is used when there is a selection in the dependent variable (the dependent variable is not always observed). It uses a similar latent variable structure such that
P(y observed | W) = Phi(W*gamma).

However the etregress calculates the treatment effects when treatment itself is endogenous, the latent variable structure in my case is
P(treatment=1 | W) = Phi(W*gamma).

I don't think, heckman is valid in my case. Why did you ask me to estimate heckman mle and heckman twostep ?

just derive it for the simpler heckman selection case, and bring together the estimation via two stages and mle

Regards,
(Ridwan)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#9

20 Feb 2023, 03:16

Sorry for the miss understanding
i didn’t mean for you to estimate your model using heckman
Rather to compare how heckman mle and heckman two are estimated and the output they produce, because they are almost identical to how etregress works
s1 probit
s2 get mill ratio/generalized residual or score from probit
s3 add it to the main regression (cf)
the difference between etregress and heckman is, as you point out, that etregress you observe data in both states, whereas with heckman you only observe one state (selected)

i would also say you may want to look I to th endogenous switching model. Which is just the preamble (in my opinion) to etregress
hth
Comment
Ridwan Sheikh

Join Date: Apr 2021

Posts: 168
#10

20 Feb 2023, 12:09

Thanks FernandoRios , that is true that etregress uses the same three-step procedure as heckman; however, each of the two is designed for different settings. The endogenous switching model is definitely a thing to look at. Thanks for recommending it.

In the present setting, I think etregress with [option] poutcome and cfunction is valid to obtain the estimates of treatment (union) and parameters of [union*(phi(.)/Phi(.))] and [(1-union)*(-phi(.)/1-Phi(.))] as lambda1~d1 and lambda0~d2 , respectively where [union*(phi(.)/Phi(.))] and [(1-union)*(-phi(.)/1-Phi(.))] respectively serve as control function at s3. I hope my understanding is correct.

Thanks and regards,
(Ridwan)

Last edited by Ridwan Sheikh; 20 Feb 2023, 12:14.
Comment

Announcement

Heckman procedure (control function) when the treatment dummy is endogenous

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment