Negative Binomial Regression or OLS?

rai rai

Join Date: May 2016

Posts: 12
#1

Negative Binomial Regression or OLS?

28 Jul 2016, 03:04

Hi,
My dataset is huge and consists of around 1 million entries. The outcome variable is discrete and suffers from a skewed distribution. I tried to use the OLS but the residuals do not have a normal distribution. I now plan to use the negative binomial distribution and run the analysis. Is my approach appropriate, cause I have read if the outcome variable has many distinct values then using an OLS would also suffice.
Please find attached some of the most frequent values of the outcome variable.

views | Freq. Percent Cum.
------------+-----------------------------------
0 | 249,672 23.81 23.81
1 |121,934 11.63 35.44
2 | 79,871 7.62 43.06
3 | 56,724 5.41 48.47
4 | 43,575 4.16 52.62
5 | 34,132 3.26 55.88

Last edited by rai rai; 28 Jul 2016, 03:12.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

28 Jul 2016, 04:02

rai rai:
why not using -poisson-?
Have you already detected real overdispersion?

Kind regards,
Carlo
(Stata 19.0)
Comment
rai rai

Join Date: May 2016

Posts: 12
#3

28 Jul 2016, 04:28

This is the summary of the data. The mean and the Std Dev is not the same (one of the assumptions of Poisson). Also my data has a lot of zero values. Shall I go for the zero inflated Negative Binomial. Please suggest
Variable Mean Std. Dev.

Var 1 0.108747 1.363356

Var 2 5.106171 2.715266

Var 3 0.356856 3.887989

Outcome Var 49.69428 342.5739
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#4

28 Jul 2016, 05:13

rai rai:
yes, -zinb- may be an option.

Kind regards,
Carlo
(Stata 19.0)
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#5

28 Jul 2016, 06:52

Carlo gave excellent suggestion.

Considering you decided to perform a zero-inflated negative binomial model, you may wish to add - vuong - in the option. This way, you get a LR test for the alpha parameter, so as to check whether the zero-inflated negative binomial model is more appropriate than the standard negative binomial model. In your case, I bet it really is.

Best regards,

Marcos
Comment
daniel klein

Join Date: Mar 2014

Posts: 3850
#6

28 Jul 2016, 07:14

Maybe it is worth mentioning that a zero-inflated model assumes that there are two distinct data generating mechanisms to be modeled. The mere observations of many zeros in the data does by no means justify such a model. This is especially true if the zeros in the data are arbitrarily coded as such and could, in principle, be assigned any other value without changing the meaning. This is the case whenever the data is not measured on a ratio scale (such as count data).

Whether a linear model (OLS is just an estimator, not a model) fits the data depends largely on how well the central tendency is represented by a mean value and whether you are interested in the central tendency at all, of course.

Concerning the poisson model, note that a violation of the assumption about the variance equaling the mean only affects the standard errors not the point estimates (cf. Bill Gould's blog entry).

Best
Daniel

Last edited by daniel klein; 28 Jul 2016, 07:18.
Comment
rai rai

Join Date: May 2016

Posts: 12
#7

28 Jul 2016, 07:54

Thank you members for all the suggestions.
In fact even I was reading up on Zero inflated model and found it is appropriate when there are two distinct data generating mechanisms. However, for my case it is not so. Zero may be a value in the regular process for the outcome variable. So , daniel klein could you please suggest should I go ahead with the Negative binomial regression?
Comment

Variable	Mean	Std. Dev.

Var 1	0.108747	1.363356
Var 2	5.106171	2.715266
Var 3	0.356856	3.887989
Outcome Var	49.69428	342.5739

Announcement

Negative Binomial Regression or OLS?

Comment

Comment

Comment

Comment

Comment

Comment