Hi,
I am doing a replication of a data-set where the dependent variable is a count with a lot of zeroes. Since it is panel data, the author has used OLS-PCSE, without doing any changes to the dependent variable. The variable is a count, measuring "number of peacekeepers". But most often the selected countries do not deploy peacekeepers, so there are many zeroes.
Actually 86,2 percent of the values on the dependent variable are zeroes. This makes it heavy-tailed, and it creates problems when choosing right estimator. I do not believe an ordinary OLS-PCSE is the correct choice of model when the dependent variable is so heavy-tailed.
I have considered different options:
- Zero-inflated negative binomial regression (ZINBR). But the dependent variable can not take negative values, only zero and positive values, and therefore it has a floor effect. If I have understood it correctly, ZINBR is not a good estimator when the variable has a floor effect.
- Then, I was advised to transform the heavy-tailed dependent variable into an inverse hyperbolic sine. Would this be helpful? And when it is transformed, how do I best use my new transformed variable? Can/should I use an inverse hyperbolic sine in an OLS-PCSE, or is this for some reason not recommended?
- I have also considered transforming the dependent variable into a dummy where it is either 0 (no peacekeepers deployed) or 1 (>0 peacekeepers deployed). This will not tell us the increase of peacekeepers when the independent change value, but it will tell us the likelihood for a country to send more than one peacekeeper when the IV changes. Could this be helpful in any way?
And, if it is possible to say, how do I understand how I have found the best model for my data-set?
As these questions might reveal, I am quite new to statistics, and this replication is a part of my introduction class.
If you have other suggestions on how to solve this problem, it is appreciated. (The independent variables in the data-set are either counts or dummies.)
Thanks
I am doing a replication of a data-set where the dependent variable is a count with a lot of zeroes. Since it is panel data, the author has used OLS-PCSE, without doing any changes to the dependent variable. The variable is a count, measuring "number of peacekeepers". But most often the selected countries do not deploy peacekeepers, so there are many zeroes.
Actually 86,2 percent of the values on the dependent variable are zeroes. This makes it heavy-tailed, and it creates problems when choosing right estimator. I do not believe an ordinary OLS-PCSE is the correct choice of model when the dependent variable is so heavy-tailed.
I have considered different options:
- Zero-inflated negative binomial regression (ZINBR). But the dependent variable can not take negative values, only zero and positive values, and therefore it has a floor effect. If I have understood it correctly, ZINBR is not a good estimator when the variable has a floor effect.
- Then, I was advised to transform the heavy-tailed dependent variable into an inverse hyperbolic sine. Would this be helpful? And when it is transformed, how do I best use my new transformed variable? Can/should I use an inverse hyperbolic sine in an OLS-PCSE, or is this for some reason not recommended?
- I have also considered transforming the dependent variable into a dummy where it is either 0 (no peacekeepers deployed) or 1 (>0 peacekeepers deployed). This will not tell us the increase of peacekeepers when the independent change value, but it will tell us the likelihood for a country to send more than one peacekeeper when the IV changes. Could this be helpful in any way?
And, if it is possible to say, how do I understand how I have found the best model for my data-set?
As these questions might reveal, I am quite new to statistics, and this replication is a part of my introduction class.
If you have other suggestions on how to solve this problem, it is appreciated. (The independent variables in the data-set are either counts or dummies.)
Thanks

Comment