Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman with two sources of sample selection and Instrumental Variable (IV)

    Dear Statalist,

    I would like to estimate the following equation using cross-section data:

    Code:
    Y = A + B*X0 + B*X1 + U
    In this equiation the dependent variable is only observed when a selection rule applies (typical sample selection problem a la Heckman). One of the independent variables, say X0, is a binary endogenous, and I'd like to use instrumental variable to address that issue within the Heckman framework. But there is another problem, which is that X0 also has a selection rule, different of the rule for Y. I've been working in two ways to estimate the main equation.

    1. The first is to estimate a Heckman model for X0 (in Stata it would be: heckprobit X0 X2, sel(X3), with X3 the exclusion variables for X0), obtain the linear prediction of X0 (predict X0_hat, xb) and use this variable instead of X0 in the main equation, and then estimate again a Heckman model this time for Y, correcting standard errors by bootstrapping.

    2. The second procedure follows Wooldridge (2010) (Section 19.6.2 in the second edition of "Econometric Analysis of Cross Section and Panel Data," MIT Press, 2010). The steps are: (i) estimate a probit model for the selection equation (I=1 when X0 and Y are observed) using all exogenous variables (including instruments Z1 and selection variables Z2). (ii) Obtain the inverse Mills ratios (IMR). (iii) Estimate the structural equation (Y in this case) by 2sls, correcting standard errors by bootstrapping.

    Code:
    ivregress 2sls Y X1 IMR (X0 = Z1 Z2 IMR)
    The problem is I don't know which of the two procedures is more adecuate. The estimates I obtain from the two are ver different (the first is positive and the second is negative, though there are possible weak instruments, but there is another problem). My questions are:


    1. As I know the first procedure would be okay if the first stage of 2sls (by hand in this case) would be estimated by OLS (using regress). It is okay to obtain the predicted value of the endogenous variable from Heckman estimates and use it in the second stage to estimate the main equation?

    2. I'm aware that the selection that applies for X0 and Y is not the same, this is Y and X0 are not observed always at the same time (there are missing values of X0 when Y is positive). Can I use the procedure anyway?

    Do you have any suggestion about this?

    Thanks in advance,

    Sergio
    Last edited by Sergio Bravo; 17 Feb 2016, 11:40.

  • #2
    More specifically, Y is innovation expenditure (R&D + other) over sales and X0 is a dummy that captures if firms cooperate with other firms in their innovation activities. X0 is observed only if firms innovate, and there are a lot of firms with innovation expenditure equal zero. I use a Heckman model to estimate innovation expenditure following Griffith et al (2006) and Crespi y Zúñiga (2011), where they assume that firms decides to invest (and report this invest) in innovation only if it is above some threshold. I observe other variables for firms that do not innovate (hence do not answer the cooperation question), so I think I can correct for sampling selection for cooperation (more specifically for the decision to innovate or not) using a Heckman model and then correct the endogeneity using the predicted value for cooperation in the Heckman model for innovation expenditure. The thing is that there are cases where innovation expenditure is missing but I observe if the firm cooperates or not (indicating that the firm innovates in other way).

    Thanks and I'm sorry if the post is too long
    Last edited by Sergio Bravo; 17 Feb 2016, 18:47.

    Comment


    • #3
      Hie Sergio, l am also using a heckman model with sample selection and an istrumental variable. I wanted to ask you if you managed to get a code for this

      Comment


      • #4
        Hi there
        if you have stata15 then you should look into eregress which was created to deal with problems like the one described above.
        best
        fernando

        Comment

        Working...
        X