Pearson chi square test for negative binomial regression

Liang Wang Jack

Join Date: Dec 2016

Posts: 169
#1

Pearson chi square test for negative binomial regression

17 Oct 2017, 07:43

Dear scientist,

I use the data from UCLA to do the Pearson chi square test for negative binomial regression. I am not sure the way I got Pearson statistic is correct, would you please help me to check? In addition, the Pearson statistic is 2046.299, it is greater than chi square(df=313) =355.26, can I say this negative binomial model doesn't fit data well?
Thanks,
Liang

Code:

use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear nbreg daysabs math i.prog predict that, n gen pearson=((daysabs-that)^2)/(that) tabstat pearson, stat(sum)
Tags: None
German Rodriguez

Join Date: Feb 2017

Posts: 169
#2

17 Oct 2017, 08:16

Your formula is correct for a Poisson model where var(Y)=mu. In a negative binomial model the variance is var(Y)=mu*(1+alpha*mu) where alpha is the overdispersion parameter. Try

Code:

gen pearson2 = (daysabs - that)^2/( that * (1 + exp(_b[/lnalpha]) * that))

I get 339.877. You can also verify the result using -glm- with the estimate of alpha from nbreg:

Code:

glm daysabs math i.prog,family(nbinomial 0.9683231)
1 like
Comment
Liang Wang Jack

Join Date: Dec 2016

Posts: 169
#3

17 Oct 2017, 12:01

Originally posted by German Rodriguez View Post

Your formula is correct for a Poisson model where var(Y)=mu. In a negative binomial model the variance is var(Y)=mu*(1+alpha*mu) where alpha is the overdispersion parameter. Try

Code:

gen pearson2 = (daysabs - that)^2/( that * (1 + exp(_b[/lnalpha]) * that))

I get 339.877. You can also verify the result using -glm- with the estimate of alpha from nbreg:

Code:

glm daysabs math i.prog,family(nbinomial 0.9683231)

Dear German Rodriguez, thanks for your helping, I figure out how to get Pearson for negative binomial, make sense and helpful. In addition, I am interested in getting Pearson for Negative binomial hurdle as well, below is the code for it, would you please help me to check it whether it's correct?

Code:

use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear preserve tnbreg daysabs math i.prog if daysabs>0,nolog predict that, n logit daysabs math i.prog,nolog predict phat,pr gen yhat=phat*that gen pearson=((daysabs-yhat)^2)/(yhat) tabstat pearson, stat(sum) restore

Best,
Liang
Comment
German Rodriguez

Join Date: Feb 2017

Posts: 169
#4

18 Oct 2017, 13:17

The general formula for Pearson's chi-squared statistic is sum(Y-E(Y))^2/var(Y). You need to plug in estimates of the expectation and variance based on your model. As noted earlier, the formula you are using is valid for Poisson models. If you use a negative binomial, or a hurdle model, you need to use different expressions. A handout from my GLM course collects a bunch of means and variances that you may find useful.

A hurdle model usually combines a binary data model for zeros and a truncated count model for positive values, in your case a logit and a truncated negative binomial. Conventionally the logit predicts the zero class, but by using daysabsent you are modeling the probability of a positive count. This should not be a problem if you are consistent. But to compute fitted values you need to multiply the probability of a positive outcome by the conditional mean of the truncated distribution, not the unconditional mean as you have done. You then need to compute the variance, which is a bit harder. The code below follows my handout

Code:

use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear gen zero = daysabs == 0 logit zero math i.prog, nolog predict p tnbreg daysabs math i.prog if daysabs>0,nolog predict m scalar alpha = exp(_b[/lnalpha]) gen f0 = (1 + alpha * m)^(-1/alpha) // nb prob of zero gen v = m*(1 + alpha*m) // nb variance gen mt = m/(1-f0) // truncated mean gen vt = v/(1-f0) - f0*mt^2 // truncated variance gen yhat = (1-p) * mt // hurdle mean gen vhat = (1-p)*vt + p*(1-p)*mt^2 // hurdle variance gen pearson = (daysabs - yhat)^2/vhat quietly sum pearson di r(sum)

I get a Pearson chi-squared of 335.36, which seems to be in the right ball park. I recall using this dataset in an exercise and a simple overdispersed Poisson model did a reasonably good job capturing the shape of the variance function.
Comment
Liang Wang Jack

Join Date: Dec 2016

Posts: 169
#5

18 Oct 2017, 19:58

Originally posted by German Rodriguez View Post

The general formula for Pearson's chi-squared statistic is sum(Y-E(Y))^2/var(Y). You need to plug in estimates of the expectation and variance based on your model. As noted earlier, the formula you are using is valid for Poisson models. If you use a negative binomial, or a hurdle model, you need to use different expressions. A handout from my GLM course collects a bunch of means and variances that you may find useful.

A hurdle model usually combines a binary data model for zeros and a truncated count model for positive values, in your case a logit and a truncated negative binomial. Conventionally the logit predicts the zero class, but by using daysabsent you are modeling the probability of a positive count. This should not be a problem if you are consistent. But to compute fitted values you need to multiply the probability of a positive outcome by the conditional mean of the truncated distribution, not the unconditional mean as you have done. You then need to compute the variance, which is a bit harder. The code below follows my handout

Code:

use https://stats.idre.ucla.edu/stat/stata/dae/nb_data, clear gen zero = daysabs == 0 logit zero math i.prog, nolog predict p tnbreg daysabs math i.prog if daysabs>0,nolog predict m scalar alpha = exp(_b[/lnalpha]) gen f0 = (1 + alpha * m)^(-1/alpha) // nb prob of zero gen v = m*(1 + alpha*m) // nb variance gen mt = m/(1-f0) // truncated mean gen vt = v/(1-f0) - f0*mt^2 // truncated variance gen yhat = (1-p) * mt // hurdle mean gen vhat = (1-p)*vt + p*(1-p)*mt^2 // hurdle variance gen pearson = (daysabs - yhat)^2/vhat quietly sum pearson di r(sum)

I get a Pearson chi-squared of 335.36, which seems to be in the right ball park. I recall using this dataset in an exercise and a simple overdispersed Poisson model did a reasonably good job capturing the shape of the variance function.

Dear Dr. German Rodriguez,Thank you very much for your explanation. I read your web pages and I feel I get more knowledge than other books, because it is more clearly. I follow your notes and get Pearson for both Poisson Hurdle and NB Hurdle. In addition to Hurdle models, I am also interested in Zero inflated Poisson model. There is one thing confused me when I write the code for getting Pearson. The mean for ZIP is E(Y)=pi+(1-pi)e^-u. This u is the mean of Poisson part, I think at this point we can get u by

Code:

zip daysabs math i.prog, inflate(i.prog)

and then

Code:

predict m

But how can we get pi?
Best regards,
Liang
Comment
German Rodriguez

Join Date: Feb 2017

Posts: 169
#6

18 Oct 2017, 21:45

Try help zip postestimation and read the syntax for predict. The option pr gives the probability of always zero (pi). But the default n is not the mean of the Poisson part, it is the expected number of events. To get mu you need to use the option xb and then exponentiate. You'll find a detailed example in this Stata log.
Comment

Announcement

Pearson chi square test for negative binomial regression

Comment

Comment

Comment

Comment

Comment