Hi,
I'm dealing with a limited dependent variable y, measured in VND - Vietnamese currency (min 0, max: 72 million VND) . A large fraction of the observations take on the value zero for this variable (roughly 40 percent); the rest have continuous values but are seemingly not normally distributed.
The Tobit model, a recommended method for such corner solution response, does not allow for different effects of independent variables x on E(y|y>0, x) and P(y>0|x). Thus, I decide to run a hurdle model as the following.
First, I take natural logarithm of (y+1). Subsequently, observations that are originally equal to zero, continue to take the same values. Meanwhile, distribution of the rest becomes much more normal.
In the following step, I ran the hurdle model by maximum-likelihood estimation in Stata (churdle linear). The outcome model and the selection model are similarly specified, and left censoring value is 0.
My questions are:
1/ Is Log(y+1) an appropriate solution to my situation? I tried to run the hurdle model with original y, but it failed to report any results.
2/ The hurdle model chooses only a part of the observations, for which y>0, to estimate effects of x on y. Could this leads to a sample selection problem? If yes, could I use the heckit method to test such problem? Particularly, p-value of the inverse Mills ratio might be an evidence for presence of the sample selection problem.
I'm dealing with a limited dependent variable y, measured in VND - Vietnamese currency (min 0, max: 72 million VND) . A large fraction of the observations take on the value zero for this variable (roughly 40 percent); the rest have continuous values but are seemingly not normally distributed.
The Tobit model, a recommended method for such corner solution response, does not allow for different effects of independent variables x on E(y|y>0, x) and P(y>0|x). Thus, I decide to run a hurdle model as the following.
First, I take natural logarithm of (y+1). Subsequently, observations that are originally equal to zero, continue to take the same values. Meanwhile, distribution of the rest becomes much more normal.
In the following step, I ran the hurdle model by maximum-likelihood estimation in Stata (churdle linear). The outcome model and the selection model are similarly specified, and left censoring value is 0.
My questions are:
1/ Is Log(y+1) an appropriate solution to my situation? I tried to run the hurdle model with original y, but it failed to report any results.
2/ The hurdle model chooses only a part of the observations, for which y>0, to estimate effects of x on y. Could this leads to a sample selection problem? If yes, could I use the heckit method to test such problem? Particularly, p-value of the inverse Mills ratio might be an evidence for presence of the sample selection problem.