I'm working from the classic Roy model with a dummy endogenous variable T and a continuous outcome y. This framework is well explained in econometric textbooks or one of the numerous papers by Heckman and his colleagues on the matter (see for example chapter 71, volume 6, part B of the Handbook of Econometrics by Heckman and Vytlacil). I am exploring the average treated on the treated (ATT), average treatment effect (ATE), and the average treated on the untreated (ATU) and of course the marginal treatment effect. First I am using a two-step control function approach based on the normality assumption and then second a semiparametric polynomial model for the distributional flexibility. The unit of observation are adolescents in school and I have a pooled sample over two waves. I have a set of observed covariates X and can include school fixed effects. Finally, I have an instrument and the first stage vector of covariates is Z= [z_iv X].
The user-written package margte will estimate and plot the MTE over the unobservable cost of treatment along with the ATE. However, I wanted to explore the model a bit myself in the two-step approach and this has led to a few problems for me. Those are one a discrepancy between my estimates for T*gr2 and that of margte when including school fixed effects and two appropriately plotting the MTE out of my estimates in Stata.
Following Wooldridge's review of control functions methods in the JHR (2015) I generate gr2 = T*(lambda(Z*delta) - (1-T)*(lambda(-Z*delta) and then estimate the regression of
y on 1, X, T, and gr2 and bootstrap the standard errors. I also estimate the model with school fixed effects in the first and second stage. Then I take the random coefficient model and regress
y on 1, X, T*(X-Xbar), T*gr2, gr2 without and then with school fixed effects. Finally, I estimate the model for those with T=1 and those with T=0, thus
y on 1, X, gr2 if T==1 and again if T==0 get the predicted values for both and generate the counterfactual difference (I call it cdiff). Okay, so without school fixed effects my estimate for the ATE (sum cdiff) agrees with that of margte. Also, the estimate I get on T*gr2 is very similar to that reported for "Mills" by margte (which is rho1*sigma1 - rho0*sigma0). In this case, the estimate on T*gr2 is not significant and is near zero. When I turn to including school fixed effects, I run into a problem. Stata returns an estimate of T*gr2 that appears strong and significantly different from zero, seeming to indicate unobserved heterogeneity in returns may indeed be a problem. margte, however, kicks back an estimate for rho1-rho0 that is near zero and insignificant. Obviously the plot of the MTE it spits out closely tracks with the ATE but my own estimates seem to disagree or does my estimate on T*gr2 not actually represent rho1-rho0? I still get a similar ATE and when I estimate the models separately for T=1 and T=0 controlling for school fixed effects and gr2 I get consistent answers to that of margte. So, any ideas what is going on with
y on 1, X, T*(X-Xbar), T*gr2 including school fixed effects?
Now, when I run margte with the parametric and semiparametric polynomial model of order 2, I do find evidence of unobserved returns when excluding school fixed effects. When including them, the evidence points toward the same conclusion but I think efficiency is lost because it interacts each school dummy with p. Is there a better way to approach this?
I would like to plot the MTE against the propensity score along with the ATT and ATU. In the CFA, can I construct the MTE manually as
mte = ate + gamhat*gr2 and then plot this against the propensity? I think I'm missing something there. How in Stata can I obtain a vector of the MTE after the second step estimation? I understand they will not differ if there are no unobserved returns but I want to learn first and second there may be unobserved returns with the para and semipara polynomial models but I do not know how to get them all together in Stata. Any programs out there already doing this or advice on how to approach it?
Below is some example code tied to this discussion. It's quick and dirty. I leave out bootstraps and merely present a few examples from Stata. Also, I cluster my work on the identifier but margte doesn't allow this. I have no doubt I'm the problem and not margte.
The user-written package margte will estimate and plot the MTE over the unobservable cost of treatment along with the ATE. However, I wanted to explore the model a bit myself in the two-step approach and this has led to a few problems for me. Those are one a discrepancy between my estimates for T*gr2 and that of margte when including school fixed effects and two appropriately plotting the MTE out of my estimates in Stata.
Following Wooldridge's review of control functions methods in the JHR (2015) I generate gr2 = T*(lambda(Z*delta) - (1-T)*(lambda(-Z*delta) and then estimate the regression of
y on 1, X, T, and gr2 and bootstrap the standard errors. I also estimate the model with school fixed effects in the first and second stage. Then I take the random coefficient model and regress
y on 1, X, T*(X-Xbar), T*gr2, gr2 without and then with school fixed effects. Finally, I estimate the model for those with T=1 and those with T=0, thus
y on 1, X, gr2 if T==1 and again if T==0 get the predicted values for both and generate the counterfactual difference (I call it cdiff). Okay, so without school fixed effects my estimate for the ATE (sum cdiff) agrees with that of margte. Also, the estimate I get on T*gr2 is very similar to that reported for "Mills" by margte (which is rho1*sigma1 - rho0*sigma0). In this case, the estimate on T*gr2 is not significant and is near zero. When I turn to including school fixed effects, I run into a problem. Stata returns an estimate of T*gr2 that appears strong and significantly different from zero, seeming to indicate unobserved heterogeneity in returns may indeed be a problem. margte, however, kicks back an estimate for rho1-rho0 that is near zero and insignificant. Obviously the plot of the MTE it spits out closely tracks with the ATE but my own estimates seem to disagree or does my estimate on T*gr2 not actually represent rho1-rho0? I still get a similar ATE and when I estimate the models separately for T=1 and T=0 controlling for school fixed effects and gr2 I get consistent answers to that of margte. So, any ideas what is going on with
y on 1, X, T*(X-Xbar), T*gr2 including school fixed effects?
Now, when I run margte with the parametric and semiparametric polynomial model of order 2, I do find evidence of unobserved returns when excluding school fixed effects. When including them, the evidence points toward the same conclusion but I think efficiency is lost because it interacts each school dummy with p. Is there a better way to approach this?
I would like to plot the MTE against the propensity score along with the ATT and ATU. In the CFA, can I construct the MTE manually as
mte = ate + gamhat*gr2 and then plot this against the propensity? I think I'm missing something there. How in Stata can I obtain a vector of the MTE after the second step estimation? I understand they will not differ if there are no unobserved returns but I want to learn first and second there may be unobserved returns with the para and semipara polynomial models but I do not know how to get them all together in Stata. Any programs out there already doing this or advice on how to approach it?
Below is some example code tied to this discussion. It's quick and dirty. I leave out bootstraps and merely present a few examples from Stata. Also, I cluster my work on the identifier but margte doesn't allow this. I have no doubt I'm the problem and not margte.
HTML Code:
**** without Sfe probit pcommtr bo $oX predict xd2h, index gen phi2h = normalden(xd2h) gen PHI2h = normal(xd2h) gen gr2 = pcommtr*phi2h/PHI2h - (1-pcommtr)*phi2h/(1-PHI2h) predict ph * reg y on 1, X, T, gr2 reg schtrcfac $oX pcommtr gr2, cluster(aid) * reg y on 1, X, T*(X-Xbar), T*gr2, gr2 reg schtrcfac $oX $pcommtr_X i.pcommtr##c.gr2, cluster(aid) putmata gr2, replace reg schtrcfac $oX gr2 if pcommtr==1, cluster(aid) predict y1hm mata b1 = st_matrix("e(b)") reg schtrcfac $oX gr2 if pcommtr==0, cluster(aid) predict y0hm mata b0 = st_matrix("e(b)") gen cdiff = y1hm - y0hm sum cdiff if pcommtr==1 // ATT sum cdiff if pcommtr==0 // ATU sum cdiff // ATE gen ate = r(mean) mata ate = st_numscalar("r(mean)") mata rho1 = b1[1,11] rho0 = b0[1,11] psi = rho1-rho0 psi_gr2 = psi:*gr2 mte = ate :+ psi_gr2 end getmata mte, replace * yep, this is wrong I'm sure sum mte gen zero = 0 * plot the mte as a function of the propensity score? * ugly graph. fix some other time. twoway lfit mte ph || function mte = ate || lfit zero ph, range(0 1) /// legend(order(1 "MTE" 2 "ATE")) drop y1h y0h y1hm y0hm xd2h phi2h PHI2h gr2 ph mte grp1 grp0 cdiff ate ************************************************************************* **** with Sfe probit pcommtr bo $oX $Sfe predict xd2h, index gen phi2h = normalden(xd2h) gen PHI2h = normal(xd2h) gen gr2 = pcommtr*phi2h/PHI2h - (1-pcommtr)*phi2h/(1-PHI2h) predict ph areg schtrcfac $oX pcommtr gr2, cluster(aid) absorb(newsid) areg schtrcfac $oX i.pcommtr##c.gr2, cluster(aid) absorb(newsid) areg schtrcfac $oX $pcommtr_X i.pcommtr##c.gr2, cluster(aid) absorb(newsid) reg schtrcfac $oX $Sfe $pcommtr_X i.pcommtr##c.gr2, cluster(aid) putmata gr2, replace reg schtrcfac $oX $Sfe gr2 if pcommtr==1, cluster(aid) predict y1hm mata b1 = st_matrix("e(b)") reg schtrcfac $oX $Sfe gr2 if pcommtr==0, cluster(aid) predict y0hm mata b0 = st_matrix("e(b)") gen cdiff = y1hm - y0hm sum cdiff if pcommtr==1 sum cdiff if pcommtr==0 sum cdiff gen ate = r(mean) mata ate = st_numscalar("r(mean)") mata rho1 = b1[1,11] rho0 = b0[1,11] psi = rho1-rho0 psi_gr2 = psi:*gr2 mte = ate :+ psi_gr2 end getmata mte, replace sum mte twoway lfit mte ph, lcolor(navy) || function mte = ate, lpattern(dash) lcolor(red) || lfit zero ph, range(0 1) /// legend(order(1 "MTE" 2 "ATE")) drop y1hm y0hm xd2h phi2h PHI2h gr2 ph mte grp1 grp0 cdiff ate ****************************************************************** * margte ****************************************************************** **** pcomm margte schtrcfac $oX, treat(pcomm bo $oX) common margte schtrcfac $oX, treat(pcommtr bo $oX) common link(probit) poly(2) margte schtrcfac $oX, treat(pcommtr bo $oX) common link(probit) semip poly(2) **** pcommtr margte schtrcfac $oX, treat(pcommtr bo $oX) common margte schtrcfac $oX, treat(pcommtr bo $oX) common link(logit) poly(2) margte schtrcfac $oX, treat(pcommtr bo $oX) common link(probit) poly(2) margte schtrcfac $oX, treat(pcommtr bo $oX) common link(probit) poly(3) // no evidence for poly > 2 margte schtrcfac $oX, treat(pcommtr bo $oX) common link(probit) poly(4) margte schtrcfac $oX, treat(pcommtr bo $oX) common link(probit) semip poly(2) margte schtrcfac $oX, treat(pcommtr bo $oX) common link(probit) semip poly(3) // no evidence for poly > 2