Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Binary Response Panel Data with Self-Selection into Binary Treatment

    Hi all,
    In a recent paper, Semykina & Wooldridge (2015) (link) suggest that Stata's -biprobit- command can be used to estimate average treatment effects (ATE) for binary responses with self selection into a binary treatment when the data has a panel structure.

    I had seen previously from Austin Nichol's slides here that biprobit can be used for endogenous switching (self selection) with binary treatment and binary response, but I did not know that it could be extended to a panel data context. For the cross-sectional case, the command would be simply something like:

    biprobit (Y= control_variables treatment) (treatment= control_variables Instruments)

    Semykina & Wooldridge (2015) suggest that the above command can be modified for panel data, explaining briefly in a footnote:

    in Stata estimating treatment effects can be implemented by pooling the data and estimating the augmented equation (with time averages) using the “biprobit” command. Standard errors robust to serial dependence can be obtained using “cluster” option.
    MAIN QUESTIONS:

    Can anyone here provide more details on how to implement this with stata? For example, what do they mean by “the augmented equation (with time averages)”? what are these time averages? Should I include year dummies? Is the panel data structure dealt with random effects in this method?

    Also, are there ways to get ATE and ATET separately? Can we do it with the margins command? Finally, is there a way to perform the test for selection bias outlined in Semykina & Wooldridge (2015) after the biprobit command in Stata?

    ALTERNATIVES:

    As an alternative to the -biprobit- command, I think there must be a way to do this with the -cmp- command by David Roodman in a manner similar to that discussed in posts such as this or this. Any guidance on how to exactly implement the self selection case and calculate ATE and ATET with the -cmp- command would also be appreciated.

    The -biprobit- command looks more attractive to me at this point because my panel data has a survey structure with probability weights and -biprobit- works with the -svy:- prefix. Although the requirement to use vce(cluster) noted by Semykina & Wooldridge (2015) will probably not let me use the svy prefix anyways.

    Another approach based on control functions is outlined in Murtazashvili & Wooldridge (2016) but I can't find stata code for that either. I know the control function approach is what is used by stata's -eteffects- command, but that command also does not handle panel data.

    References:

    Murtazashvili & Wooldridge (2016) - A control function approach to estimating switching regression models with endogenous explanatory variables and endogenous switching

    Semykina & Wooldridge (2015) - Binary response panel data models with sample selection and self selection
    Last edited by Mohammad Keyhani; 24 Sep 2016, 14:22.

  • #2
    Hi Mohammad. I can help with that. Do you just want to use a usual pooled analysis or use the correlated random effects approach that allows the instruments to be correlated with heterogeneity? If you have a time-varying IV then I would suggest the CRE approach. JW

    Comment


    • #3
      Thank you for responding Dr. Wooldridge,
      I was under the impression that the simple biprobit command listed above does the pooled analysis already, but it does not account for the panel structure. I can also use the eteffects command for pooled analysis, although the results differ slightly from biprobit. From your paper Semykina & Wooldridge (2015), I got the hint that perhaps accounting for panel structure will be possible with the biprobit approach.

      So I want to account for the panel structure using random effects. I have a mix of time-invariant and time-varying exogenous variables, a time-varying treatment variable, and trying a couple of different Instruments (one of them is time-varying).

      My dependent variable is "closure" (a dummy indicating if the firm went out of business that year) and main treatment variable is "int_yes" a dummy indicating if a firm has internationalized. My data is currently long form with each row representing a firm-year, and it has a stratified survey structure with pweights.

      I have made some inroads with Roodman's cmp command and especially like that it lets me do duration analysis with the method suggested by Bartus & Roodman (2014). However, when I try to add random effects it either fails to run, fails to converge or takes too long to run. Also it is not clear to me how to calculate the average treatment effect (ATE) and average treatment effect on the treated (ATET) after the -cmp- command.

      References:
      Bartus & Roodman (2014) - Estimation of multiprocess survival models with cmp

      Footnote: A multiply imputed version of the data I'm working with is also available, but I have given up trying to make all these leading edge estimation procedures work with mi estimate!
      Last edited by Mohammad Keyhani; 26 Sep 2016, 01:23.

      Comment


      • #4
        Mohammad: could you please give us full bibliographic references for all the work that you've cited so far? thanks. [See Forum FAQ. It would make it much easier for other interested readers to find the work; especially those outside your sub-area]

        Comment


        • #5
          Mohammad: In the footnote where we mention "biprobit" we are explicitly discussing pooled methods, not methods that account for a particular error structure, such as random effects. In nonlinear contexts, the RE approach is much less robust because it requires serial independence of the errors. The pooled method does not. One simply clusters the standard errors.

          More important is the nature of your instruments. We recommend using the Mundlak device on the IVs. Then estimation with pooled biprobit is easy and more robust than an RE approach. (The same is true for just the usual probit model, too: pooled probit is more robust than RE probit unless T is large.) If you get very different estimates of the ATE using pooled versus RE methods then, if anything, the latter should not be trusted.

          Comment


          • #6
            Dear Professor Wooldridge and Mohammad,

            Did you manage to get the answer? I am currently struggling with more or less the same problem. I'm trying to use endogenous switching regression method to identity the treatment effect of training program on both continuous variables (e.g. income) and binary variables (e.g. whether the treated observation experienced occupational mobility), in the short run, medium run and long run. I found that most of the user-written commands only suits for cross-sectional data, but not panel (e.g. movestay). I also consult the paper Mohammad did mention and have the same question with the suggestion "pooling the data and estimating the augmented equation (with time averages) using the “biprobit” command" So my question is that: is there any command/method that allow me to apply endogenous switching regression with panel data in Stata, and how can I perform that?

            Thank you in advance for your help!
            Last edited by Patima Chong; 06 Mar 2017, 07:18.

            Comment

            Working...
            X