Hi everyone,
this is my first post on statalist - if there are rules/customs/traditions I am not adherring to just let me know and I will try to edit my post in such way.
I am currently dealing with a data set/regression where my desired DV is the amount of funding (in USD) obtained by startups. Naturally the DV contains a lot of zero's (~55%) and has a continuous distribution for y>0. Finding a suitable regression model for this particular type of data has been quite challenging.
Having read the relevant literature for some time now I think I managed to find two criteria I need to take into account:
In addition to estimating a tobit and tobit type 2 (Heckman) model which both do not account for both types of zeros I identified three potential models as the "perfect models" but haven't found any
stata implementations (and before coding this myself I thought asking here may help):
Greetings from Hamburg,
Jan
this is my first post on statalist - if there are rules/customs/traditions I am not adherring to just let me know and I will try to edit my post in such way.
I am currently dealing with a data set/regression where my desired DV is the amount of funding (in USD) obtained by startups. Naturally the DV contains a lot of zero's (~55%) and has a continuous distribution for y>0. Finding a suitable regression model for this particular type of data has been quite challenging.
Having read the relevant literature for some time now I think I managed to find two criteria I need to take into account:
- Source of zeros: The zero values of my DV are of two types. "Unobserved" zeros for startups that simply do not want to participate in the market for funding (irrespectively of the "price") and "observed" or "true" zeros for startups that are looking for funding but do not receive any. My current understanding is that each requires a different stochastic model to estimate.
- Correlation of participation and consumption decision: Should the model assume indepdendence or dependence between the decision to participate in the market and the decision how much each startup receives (if any) -> current hypothesis is that the model should assume dependence
In addition to estimating a tobit and tobit type 2 (Heckman) model which both do not account for both types of zeros I identified three potential models as the "perfect models" but haven't found any
stata implementations (and before coding this myself I thought asking here may help):
- Cragg's Double Hurdle Model (indepedent), specifically equations (5) and (6) from his original paper (not his two part single hurdle model which often is referred to as "Double Hurdle"). The available craggit and churdle commands to unfortuantely (at least to my knowledge) "only" fit the single hurdle alternative with 1. probit, 2. truncated normal (Cragg, J.G., 1971, Some statistical models for limited dependent variables with applications to the demand for durable goods, Econometrica 39, 829-844.)
- Blundell's double hurdle model which is essentially cragg's double hurdle model but assuming dependence between both hurdles (http://sites.psu.edu/scottcolby/wp-c...obit-model.pdf with application in Blundell, R.W., J. Ham and C. Meghir, 1986, Unemployment, and female labour supply, UCL economics discussion paper, forthcoming in the Economic Journal.)
- Bernoulli/Lognormal Mixture Model for Censored Data described in Moulton, Lawrence H., and Neal A. Halsey. “A Mixture Model with Detection Limits for Regression Analyses of Antibody Response to Vaccine.” Biometrics, vol. 51, no. 4, 1995, pp. 1570–1578. www.jstor.org/stable/2533289)
Greetings from Hamburg,
Jan
Comment