Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative Binomial Regression or OLS?

    Hi,
    My dataset is huge and consists of around 1 million entries. The outcome variable is discrete and suffers from a skewed distribution. I tried to use the OLS but the residuals do not have a normal distribution. I now plan to use the negative binomial distribution and run the analysis. Is my approach appropriate, cause I have read if the outcome variable has many distinct values then using an OLS would also suffice.
    Please find attached some of the most frequent values of the outcome variable.

    views | Freq. Percent Cum.
    ------------+-----------------------------------
    0 | 249,672 23.81 23.81
    1 |121,934 11.63 35.44
    2 | 79,871 7.62 43.06
    3 | 56,724 5.41 48.47
    4 | 43,575 4.16 52.62
    5 | 34,132 3.26 55.88
    Last edited by rai rai; 28 Jul 2016, 03:12.

  • #2
    rai rai:
    why not using -poisson-?
    Have you already detected real overdispersion?
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3

      This is the summary of the data. The mean and the Std Dev is not the same (one of the assumptions of Poisson). Also my data has a lot of zero values. Shall I go for the zero inflated Negative Binomial. Please suggest
      Variable Mean Std. Dev.
      Var 1 0.108747 1.363356
      Var 2 5.106171 2.715266
      Var 3 0.356856 3.887989
      Outcome Var 49.69428 342.5739

      Comment


      • #4
        rai rai:
        yes, -zinb- may be an option.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Carlo gave excellent suggestion.

          Considering you decided to perform a zero-inflated negative binomial model, you may wish to add - vuong - in the option. This way, you get a LR test for the alpha parameter, so as to check whether the zero-inflated negative binomial model is more appropriate than the standard negative binomial model. In your case, I bet it really is.
          Best regards,

          Marcos

          Comment


          • #6
            Maybe it is worth mentioning that a zero-inflated model assumes that there are two distinct data generating mechanisms to be modeled. The mere observations of many zeros in the data does by no means justify such a model. This is especially true if the zeros in the data are arbitrarily coded as such and could, in principle, be assigned any other value without changing the meaning. This is the case whenever the data is not measured on a ratio scale (such as count data).

            Whether a linear model (OLS is just an estimator, not a model) fits the data depends largely on how well the central tendency is represented by a mean value and whether you are interested in the central tendency at all, of course.

            Concerning the poisson model, note that a violation of the assumption about the variance equaling the mean only affects the standard errors not the point estimates (cf. Bill Gould's blog entry).

            Best
            Daniel
            Last edited by daniel klein; 28 Jul 2016, 07:18.

            Comment


            • #7
              Thank you members for all the suggestions.
              In fact even I was reading up on Zero inflated model and found it is appropriate when there are two distinct data generating mechanisms. However, for my case it is not so. Zero may be a value in the regular process for the outcome variable. So , daniel klein could you please suggest should I go ahead with the Negative binomial regression?

              Comment

              Working...
              X