How should I estimate a latent outcome with perfect separation?

Nils Enevoldsen
Join Date: Oct 2014
Posts: 294
How should I estimate a latent outcome with perfect separation?

02 Dec 2022, 10:50
Code:
*********
* Setup *
*********

clear

/* Suppose 160 observations */
set obs 160

/* Half have some predetermined characteristic */
gen male = mod(_n,2)
sort male

/* Half are treated, half are control (stratified) */
gen treated = mod(_n,2)
sort male treated

/* Our outcome lies on [0,1]. It's not obvious what distribution it has, but
   let's say it's roughly some kind of beta distribution.

   The predetermined characterstic increases the outcome.
   
   Treatment also increases the outcome. */
gen outcome = .
replace outcome = rbeta(2,20) if !male & !treated
replace outcome = rbeta(2,10) if !male & treated
replace outcome = rbeta(2,5) if male & !treated
replace outcome = rbeta(2,3) if male & treated

/* But the outcome is also zero-inflated. It's not a censored outcome, but the
   characteristic and treatment similarly affect the extensive margin. */
replace outcome = 0 if !male & !treated & runiform(0,1) > .2
replace outcome = 0 if !male & treated & runiform(0,1) > .4
replace outcome = 0 if male & !treated & runiform(0,1) > .6
replace outcome = 0 if male & treated & runiform(0,1) > .8

/* And to spice things up, it just so happens that we observe all zeroes for the
   control group without the characteristic. This was not preordained, it just
   happened this way. */
replace outcome = 0 if !male & !treated

************
* Analysis *
************

/* The problem: What is a reasonable way to estimate the latent outcome? */

/* OLS gives sensible-looking coefficients and errors, but doesn't estimate the
   latent outcome, just the observed outcome. */
reg outcome i.male##i.treated, vce(ro)

/* Tobit (type 1) gives bizarre coefficient estimates… */
tobit outcome i.male##i.treated, ll(0) vce(ro)

/* …without robust errors, you can see that something is definitely breaking.
   Maybe this issue: https://www.zeileis.org/news/biasreduction/ */
tobit outcome i.male##i.treated, ll(0)

/* Trying to model the extensive and intensive margins seperately (e.g. with
   probit and poisson respectively) fails in both stages due to the perfect
   separation. */
gen extensive = outcome != 0
gen intensive = outcome if outcome != 0
probit extensive i.male##i.treated
glm intensive i.male##i.treated, family(binomial) link(logit) vce(robust)

/* Intuitively, given the assumption (and indeed the limited empirical evidence)
   that the predictors affect the extensive and intensive margins in the same
   direction, it seems like we can at least put an upper bound on the latent
   outcome for this completely separated group (!male & !treated), or at least
   model the process in a more holistic manner, but I'm not sure how to best
   estimate/model/represent this. Any suggestions? */
Tags: None
Announcement

How should I estimate a latent outcome with perfect separation?