Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How should I estimate a latent outcome with perfect separation?

    Code:
    *********
    * Setup *
    *********
    
    clear
    
    /* Suppose 160 observations */
    set obs 160
    
    /* Half have some predetermined characteristic */
    gen male = mod(_n,2)
    sort male
    
    /* Half are treated, half are control (stratified) */
    gen treated = mod(_n,2)
    sort male treated
    
    /* Our outcome lies on [0,1]. It's not obvious what distribution it has, but
       let's say it's roughly some kind of beta distribution.
    
       The predetermined characterstic increases the outcome.
       
       Treatment also increases the outcome. */
    gen outcome = .
    replace outcome = rbeta(2,20) if !male & !treated
    replace outcome = rbeta(2,10) if !male & treated
    replace outcome = rbeta(2,5) if male & !treated
    replace outcome = rbeta(2,3) if male & treated
    
    /* But the outcome is also zero-inflated. It's not a censored outcome, but the
       characteristic and treatment similarly affect the extensive margin. */
    replace outcome = 0 if !male & !treated & runiform(0,1) > .2
    replace outcome = 0 if !male & treated & runiform(0,1) > .4
    replace outcome = 0 if male & !treated & runiform(0,1) > .6
    replace outcome = 0 if male & treated & runiform(0,1) > .8
    
    /* And to spice things up, it just so happens that we observe all zeroes for the
       control group without the characteristic. This was not preordained, it just
       happened this way. */
    replace outcome = 0 if !male & !treated
    
    ************
    * Analysis *
    ************
    
    /* The problem: What is a reasonable way to estimate the latent outcome? */
    
    /* OLS gives sensible-looking coefficients and errors, but doesn't estimate the
       latent outcome, just the observed outcome. */
    reg outcome i.male##i.treated, vce(ro)
    
    /* Tobit (type 1) gives bizarre coefficient estimates… */
    tobit outcome i.male##i.treated, ll(0) vce(ro)
    
    /* …without robust errors, you can see that something is definitely breaking.
       Maybe this issue: https://www.zeileis.org/news/biasreduction/ */
    tobit outcome i.male##i.treated, ll(0)
    
    /* Trying to model the extensive and intensive margins seperately (e.g. with
       probit and poisson respectively) fails in both stages due to the perfect
       separation. */
    gen extensive = outcome != 0
    gen intensive = outcome if outcome != 0
    probit extensive i.male##i.treated
    glm intensive i.male##i.treated, family(binomial) link(logit) vce(robust)
    
    /* Intuitively, given the assumption (and indeed the limited empirical evidence)
       that the predictors affect the extensive and intensive margins in the same
       direction, it seems like we can at least put an upper bound on the latent
       outcome for this completely separated group (!male & !treated), or at least
       model the process in a more holistic manner, but I'm not sure how to best
       estimate/model/represent this. Any suggestions? */
Working...
X