Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman procedure with endogenous binary treatment

    Hi everyone,

    I am trying to run a simple Heckman procedure where we have 1 endogenous treatment variable and 1 other covariate. My purpose is to try to compare the LATE from using an 2sls IV procedure and from the heckman procedure. However, since the treatment is endogenous, should I use the fitted values from the first stage in the Heckman procedure, or just run it with normally without using the fitted values?

    Code:
    Simple: heckman outcome treatment, select(selection = treatment x2) twostep
    Using fitted from 2sls: heckman outcome treatment, select(selection = fitted_treatment x2) two step
    Last edited by Afsanul Hasib; 15 Apr 2024, 03:43.

  • #2
    Plugging in the fitted values is generally inconsistent. You can instead follow Procedure 19.6.2 in my 2010 MIT Press book. You obtain the inverse Mills ratio from the first stage probit, and then add those in a second stage, instrumenting for the EEV at the same time.

    Comment


    • #3
      To make this convincing, you need an instrument for treatment and another exogenous variable that determines treatment. I'll take z1 to be the variable that primarily acts as IV for treatment. If you have other controls, include them in the selection and ivregress stages.

      Code:
      probit selection z1 z2
      predict imr, score
      ivregress outcome (treatment = z1) imr if select, robust
      The t statistic on imr is valid for testing the null of no selection bias. It allows treatment to be endogenous. You should bootstrap all steps to get proper standard errors if you use the estimates that include the imr.

      Comment


      • #4
        Hi Jeff - I am trying to implement your approach (not with a binary treatment, but in general with an endogenous regressor), following also this older post https://www.statalist.org/forums/for...quations-model

        What is not clear to me is whether the exogenous regressors in the structural equation should be *excluded* instruments for the 2SLS. For instance, using the example in text (17.7 in my 2002 version, perhaps 19.7 in 2010), which has Endogenous Education and Sample Selection. you say: "in the labor force participation equation we include exper, exper2, nwifeinc, kidslt6, kidsge6, motheduc, fatheduc, and huseduc (not educ). In estimating equation (17.29), the same set of variables, along with \hat{lambda3}, are used as IVs". But it is not clear whether exper and exper2 should be included or excluded instruments (given that they were controls in the previous example). In other words, we run first

        Code:
        probit part nwifeinc huseduc motheduc fatheduc kidslt6 kidsge6  exper expersq
        predict imr if part==1, score
        But at the second stage, do we run

        Code:
        ivregress 2sls lwage  exper expersq imr (educ = huseduc motheduc fatheduc nwifeinc huseduc motheduc fatheduc kidslt6 kidsge6 imr)  if part==1
        or

        Code:
        ivregress 2sls lwage   imr (educ = huseduc motheduc fatheduc nwifeinc huseduc motheduc fatheduc kidslt6 kidsge6 imr)  if part==1
        ?


        This is my output, by the way. Neither of these gives the IMR of 0.40 reported in the text, though the estimate for the first scenario is pretty close for educ.

        Code:
        . ivregress 2sls lwage  exper expersq imr (educ =  huseduc motheduc fatheduc nwifeinc huseduc motheduc fatheduc kidslt6 kidsge6 imr)  if part==1
        note: huseduc omitted because of collinearity.
        note: motheduc omitted because of collinearity.
        note: fatheduc omitted because of collinearity.
        
        Instrumental-variables 2SLS regression            Number of obs   =        428
                                                          Wald chi2(4)    =      38.74
                                                          Prob > chi2     =     0.0000
                                                          R-squared       =     0.1558
                                                          Root MSE        =     .66372
        
        ------------------------------------------------------------------------------
               lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                educ |   .0896119   .0214973     4.17   0.000      .047478    .1317459
               exper |   .0567609   .0190215     2.98   0.003     .0194794    .0940424
             expersq |  -.0011364   .0004819    -2.36   0.018    -.0020809   -.0001919
                 imr |   .1840926   .1812356     1.02   0.310    -.1711226    .5393078
               _cons |   -.522748   .3852598    -1.36   0.175    -1.277843    .2323474
        ------------------------------------------------------------------------------
        Endogenous: educ
        Exogenous:  exper expersq imr huseduc motheduc fatheduc nwifeinc kidslt6
                    kidsge6
        
        . ivregress 2sls lwage    imr (educ =  huseduc motheduc fatheduc nwifeinc huseduc motheduc fatheduc kidslt6 kidsge6 imr exper expersq)  if part==1
        note: huseduc omitted because of collinearity.
        note: motheduc omitted because of collinearity.
        note: fatheduc omitted because of collinearity.
        
        Instrumental-variables 2SLS regression            Number of obs   =        428
                                                          Wald chi2(2)    =      26.18
                                                          Prob > chi2     =     0.0000
                                                          R-squared       =     0.1283
                                                          Root MSE        =     .67443
        
        ------------------------------------------------------------------------------
               lwage | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                educ |   .0797849   .0215504     3.70   0.000     .0375468    .1220229
                 imr |   -.320866   .1067777    -3.00   0.003    -.5301464   -.1115855
               _cons |    .363657   .2892566     1.26   0.209    -.2032754    .9305895
        ------------------------------------------------------------------------------
        Endogenous: educ
        Exogenous:  imr huseduc motheduc fatheduc nwifeinc kidslt6 kidsge6 exper
                    expersq
        Please let me know, I am at a loss!

        PS: I'm sure you are aware that Stata has introduced this procedure in eregress, though it appears that it uses some sort of maximum likelihood and not a two-step. Care to comment on the differences?

        Comment

        Working...
        X