Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pearson chi square test for negative binomial regression

    Dear scientist,

    I use the data from UCLA to do the Pearson chi square test for negative binomial regression. I am not sure the way I got Pearson statistic is correct, would you please help me to check? In addition, the Pearson statistic is 2046.299, it is greater than chi square(df=313) =355.26, can I say this negative binomial model doesn't fit data well?
    Thanks,
    Liang

    Code:
    use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear
    nbreg daysabs math i.prog
    predict that, n
    
    gen pearson=((daysabs-that)^2)/(that)
    tabstat pearson, stat(sum)

  • #2
    Your formula is correct for a Poisson model where var(Y)=mu. In a negative binomial model the variance is var(Y)=mu*(1+alpha*mu) where alpha is the overdispersion parameter. Try

    Code:
     gen pearson2 = (daysabs - that)^2/( that * (1 + exp(_b[/lnalpha]) * that))
    I get 339.877. You can also verify the result using -glm- with the estimate of alpha from nbreg:

    Code:
    glm daysabs math i.prog,family(nbinomial 0.9683231)

    Comment


    • #3
      Originally posted by German Rodriguez View Post
      Your formula is correct for a Poisson model where var(Y)=mu. In a negative binomial model the variance is var(Y)=mu*(1+alpha*mu) where alpha is the overdispersion parameter. Try

      Code:
      gen pearson2 = (daysabs - that)^2/( that * (1 + exp(_b[/lnalpha]) * that))
      I get 339.877. You can also verify the result using -glm- with the estimate of alpha from nbreg:

      Code:
      glm daysabs math i.prog,family(nbinomial 0.9683231)
      Dear German Rodriguez, thanks for your helping, I figure out how to get Pearson for negative binomial, make sense and helpful. In addition, I am interested in getting Pearson for Negative binomial hurdle as well, below is the code for it, would you please help me to check it whether it's correct?
      Code:
      use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear
      preserve
      tnbreg daysabs math i.prog if daysabs>0,nolog
      predict that, n
      logit daysabs math i.prog,nolog
      predict  phat,pr
      gen yhat=phat*that
      gen pearson=((daysabs-yhat)^2)/(yhat)
      tabstat pearson, stat(sum)
      restore
      Best,
      Liang

      Comment


      • #4
        The general formula for Pearson's chi-squared statistic is sum(Y-E(Y))^2/var(Y). You need to plug in estimates of the expectation and variance based on your model. As noted earlier, the formula you are using is valid for Poisson models. If you use a negative binomial, or a hurdle model, you need to use different expressions. A handout from my GLM course collects a bunch of means and variances that you may find useful.

        A hurdle model usually combines a binary data model for zeros and a truncated count model for positive values, in your case a logit and a truncated negative binomial. Conventionally the logit predicts the zero class, but by using daysabsent you are modeling the probability of a positive count. This should not be a problem if you are consistent. But to compute fitted values you need to multiply the probability of a positive outcome by the conditional mean of the truncated distribution, not the unconditional mean as you have done. You then need to compute the variance, which is a bit harder. The code below follows my handout

        Code:
        use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear
        
        gen zero = daysabs == 0
        logit zero math i.prog, nolog
        predict p
        tnbreg daysabs math i.prog if daysabs>0,nolog
        predict m
        
        scalar alpha = exp(_b[/lnalpha])
        gen f0 = (1 + alpha * m)^(-1/alpha) // nb prob of zero
        gen v = m*(1 + alpha*m)             // nb variance
        gen mt = m/(1-f0)                   // truncated mean
        gen vt = v/(1-f0) - f0*mt^2         // truncated variance
        gen yhat = (1-p) * mt               // hurdle mean
        gen vhat = (1-p)*vt + p*(1-p)*mt^2  // hurdle variance
        gen pearson = (daysabs - yhat)^2/vhat
        quietly sum pearson
        di r(sum)
        I get a Pearson chi-squared of 335.36, which seems to be in the right ball park. I recall using this dataset in an exercise and a simple overdispersed Poisson model did a reasonably good job capturing the shape of the variance function.

        Comment


        • #5
          Originally posted by German Rodriguez View Post
          The general formula for Pearson's chi-squared statistic is sum(Y-E(Y))^2/var(Y). You need to plug in estimates of the expectation and variance based on your model. As noted earlier, the formula you are using is valid for Poisson models. If you use a negative binomial, or a hurdle model, you need to use different expressions. A handout from my GLM course collects a bunch of means and variances that you may find useful.

          A hurdle model usually combines a binary data model for zeros and a truncated count model for positive values, in your case a logit and a truncated negative binomial. Conventionally the logit predicts the zero class, but by using daysabsent you are modeling the probability of a positive count. This should not be a problem if you are consistent. But to compute fitted values you need to multiply the probability of a positive outcome by the conditional mean of the truncated distribution, not the unconditional mean as you have done. You then need to compute the variance, which is a bit harder. The code below follows my handout

          Code:
          use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear
          
          gen zero = daysabs == 0
          logit zero math i.prog, nolog
          predict p
          tnbreg daysabs math i.prog if daysabs>0,nolog
          predict m
          
          scalar alpha = exp(_b[/lnalpha])
          gen f0 = (1 + alpha * m)^(-1/alpha) // nb prob of zero
          gen v = m*(1 + alpha*m) // nb variance
          gen mt = m/(1-f0) // truncated mean
          gen vt = v/(1-f0) - f0*mt^2 // truncated variance
          gen yhat = (1-p) * mt // hurdle mean
          gen vhat = (1-p)*vt + p*(1-p)*mt^2 // hurdle variance
          gen pearson = (daysabs - yhat)^2/vhat
          quietly sum pearson
          di r(sum)
          I get a Pearson chi-squared of 335.36, which seems to be in the right ball park. I recall using this dataset in an exercise and a simple overdispersed Poisson model did a reasonably good job capturing the shape of the variance function.
          Dear Dr. German Rodriguez,Thank you very much for your explanation. I read your web pages and I feel I get more knowledge than other books, because it is more clearly. I follow your notes and get Pearson for both Poisson Hurdle and NB Hurdle. In addition to Hurdle models, I am also interested in Zero inflated Poisson model. There is one thing confused me when I write the code for getting Pearson. The mean for ZIP is E(Y)=pi+(1-pi)e-u. This u is the mean of Poisson part, I think at this point we can get u by
          Code:
          zip daysabs math i.prog, inflate(i.prog)
          and then
          Code:
          predict m
          But how can we get pi?
          Best regards,
          Liang

          Comment


          • #6
            Try help zip postestimation and read the syntax for predict. The option pr gives the probability of always zero (pi). But the default n is not the mean of the Poisson part, it is the expected number of events. To get mu you need to use the option xb and then exponentiate. You'll find a detailed example in this Stata log.

            Comment

            Working...
            X