Hi,
My dataset is huge and consists of around 1 million entries. The outcome variable is discrete and suffers from a skewed distribution. I tried to use the OLS but the residuals do not have a normal distribution. I now plan to use the negative binomial distribution and run the analysis. Is my approach appropriate, cause I have read if the outcome variable has many distinct values then using an OLS would also suffice.
Please find attached some of the most frequent values of the outcome variable.
views | Freq. Percent Cum.
------------+-----------------------------------
0 | 249,672 23.81 23.81
1 |121,934 11.63 35.44
2 | 79,871 7.62 43.06
3 | 56,724 5.41 48.47
4 | 43,575 4.16 52.62
5 | 34,132 3.26 55.88
My dataset is huge and consists of around 1 million entries. The outcome variable is discrete and suffers from a skewed distribution. I tried to use the OLS but the residuals do not have a normal distribution. I now plan to use the negative binomial distribution and run the analysis. Is my approach appropriate, cause I have read if the outcome variable has many distinct values then using an OLS would also suffice.
Please find attached some of the most frequent values of the outcome variable.
views | Freq. Percent Cum.
------------+-----------------------------------
0 | 249,672 23.81 23.81
1 |121,934 11.63 35.44
2 | 79,871 7.62 43.06
3 | 56,724 5.41 48.47
4 | 43,575 4.16 52.62
5 | 34,132 3.26 55.88
Comment