Code:
********* * Setup * ********* clear /* Suppose 160 observations */ set obs 160 /* Half have some predetermined characteristic */ gen male = mod(_n,2) sort male /* Half are treated, half are control (stratified) */ gen treated = mod(_n,2) sort male treated /* Our outcome lies on [0,1]. It's not obvious what distribution it has, but let's say it's roughly some kind of beta distribution. The predetermined characterstic increases the outcome. Treatment also increases the outcome. */ gen outcome = . replace outcome = rbeta(2,20) if !male & !treated replace outcome = rbeta(2,10) if !male & treated replace outcome = rbeta(2,5) if male & !treated replace outcome = rbeta(2,3) if male & treated /* But the outcome is also zero-inflated. It's not a censored outcome, but the characteristic and treatment similarly affect the extensive margin. */ replace outcome = 0 if !male & !treated & runiform(0,1) > .2 replace outcome = 0 if !male & treated & runiform(0,1) > .4 replace outcome = 0 if male & !treated & runiform(0,1) > .6 replace outcome = 0 if male & treated & runiform(0,1) > .8 /* And to spice things up, it just so happens that we observe all zeroes for the control group without the characteristic. This was not preordained, it just happened this way. */ replace outcome = 0 if !male & !treated ************ * Analysis * ************ /* The problem: What is a reasonable way to estimate the latent outcome? */ /* OLS gives sensible-looking coefficients and errors, but doesn't estimate the latent outcome, just the observed outcome. */ reg outcome i.male##i.treated, vce(ro) /* Tobit (type 1) gives bizarre coefficient estimates… */ tobit outcome i.male##i.treated, ll(0) vce(ro) /* …without robust errors, you can see that something is definitely breaking. Maybe this issue: https://www.zeileis.org/news/biasreduction/ */ tobit outcome i.male##i.treated, ll(0) /* Trying to model the extensive and intensive margins seperately (e.g. with probit and poisson respectively) fails in both stages due to the perfect separation. */ gen extensive = outcome != 0 gen intensive = outcome if outcome != 0 probit extensive i.male##i.treated glm intensive i.male##i.treated, family(binomial) link(logit) vce(robust) /* Intuitively, given the assumption (and indeed the limited empirical evidence) that the predictors affect the extensive and intensive margins in the same direction, it seems like we can at least put an upper bound on the latent outcome for this completely separated group (!male & !treated), or at least model the process in a more holistic manner, but I'm not sure how to best estimate/model/represent this. Any suggestions? */