Poisson Model versus Negative Binomial Model with Equidispersion

Elio Bolliger

Join Date: Aug 2017

Posts: 9
#1

Poisson Model versus Negative Binomial Model with Equidispersion

29 Aug 2017, 02:00

Dear Community
I have a question regarding some output I obtained in Stata. Tests for equidispersion showed that I can not reject the Null hypothesis of equidispersion on a 1% significance level (of course I know this doesn't mean for sure that the data is 100% equidispersed). Looking at the data and its distribution however give additional evidence for equidispersion.
However, I wondered given we have perfect equidispersion if the results obtained by a Negative Binomial are exactly the same as for a Poisson model?

Please find below details regarding the code and output:

First, the poisson regression I used the following code

Code:

poisson totscore5 i.dn042_ i.agegr3 i.ch001_gr2 i.easy_isced1997_gr i.partner_mod2 i.iv009_rec_gr2 i.region, vce(cluster hhid6)

The ouptut looks as stated below:

Additionally, the code used to test for equidispersion was:

Code:

estat gof

Here the output:

Finally, the code and the output for the Negative Binomial model:

Code:

nbreg totscore5 i.dn042_ i.agegr3 i.ch001_gr2 i.easy_isced1997_gr i.partner_mod2 i.iv009_rec_gr2 i.region, vce(cluster hhid6)

Usually, there is some output regarding the alpha at the end of the table with confidence intervals and standard deviation. However, this is not the case here so I wondered if this might be due to equidispersion and therefore alpha = 0. Also I was not sure if in this case it is really possible that the results of Poisson and Neg. Binomial model can be exactly the same.

Thanks Carlo Lazzaro for the remarks!

Best regards,
Elio

Last edited by Elio Bolliger; 29 Aug 2017, 02:42.
Tags: data, regression
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#2

29 Aug 2017, 02:03

Elio:
welcome to the list.
As per FAQ, your chances of getting helpful replies are conditional on posting what you typed and what Stata gave you back (within CODE delimiters, please). Thanks.

Kind regards,
Carlo
(Stata 19.0)
2 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#3

29 Aug 2017, 02:52

Elio:
it seems that you do not have overdispersion.
Hence, go -poisson-.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Elio Bolliger

Join Date: Aug 2017

Posts: 9
#4

29 Aug 2017, 02:56

Carlo Lazzaro
Thanks for your reply and the advice. However, I was also interested in knowing the reason why the results for Poisson and Negbin are exactly the same and wondered if this could be due to equidispersion.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#5

29 Aug 2017, 03:02

Elio:
I think that it could be due to the absence of evidence of overdispersion (or underdispersion) in your data.
The only concern I may have relates to the structure of your data: are they panel or cross-sectional?
If they were panel, I would go -xtpoisson-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Elio Bolliger

Join Date: Aug 2017

Posts: 9
#6

29 Aug 2017, 03:08

Dear Carlo
Thanks for your help and your advice. The data is cross-sectional so it should be correct.
Best regards,
Elio
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17712
#7

29 Aug 2017, 03:13

Elio:
thaks for providing furter details.

Kind regards,
Carlo
(Stata 19.0)
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#8

29 Aug 2017, 05:52

Elio: Can you post the frequency distribution of your dependent variable, totscore5, e.g.

Code:

tab totscore5, missing

One can obtain exactly the kind of poisson=nbreg results you've reported for particular structures of the dependent variable.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#9

29 Aug 2017, 07:15

Dear Elio,

Just to add to the excellent advice already provided, my guess is that you actually have a little bit of under-dispersion. Hence, -nbreg- tries to give you a negative estimate of alpha and, because that is not possible, essentially finds alpha = 0. With stronger under-dispersion, -nbreg- would probably not converge.

On a side note, you do not tell us what is the variable that you are modeling, but the name suggests it is not a true count. If that is the case, over-, under-, and equi-dispersion are meaningless because you can go from one to the other just by changing the scale of the dependent variable.

Best wishes,

Joao
2 likes
Comment
Elio Bolliger

Join Date: Aug 2017

Posts: 9
#10

31 Aug 2017, 02:45

Dear John Mullahy and Joao Santos Silva

I am currently out of Office but I will reply as soon as I am back (latest 2.09.2017) to your helpful advices. I would be very interested in your following opinions. I really appreciate your help and sorry for the delay.

Best regards,
Elio
Comment
Elio Bolliger

Join Date: Aug 2017

Posts: 9
#11

31 Aug 2017, 06:38

I found a way to access the data already today. So this would be the result of the following code:

Code:

tab totscore5, missing

John Mullahy Does this distribution confirm what you had in mind about a specific strucutre of the dependent variable?

Joao Santos Silva The dependent variable is a testscore ranging from 0 to 8 so this should be fine if I am not mistaken. However, I would be interested in knowing why a nbreg does not converge in some cases if the data is underdispersed.

Best wishes,
Elio
Comment
John Mullahy

Join Date: Dec 2016

Posts: 752
#12

31 Aug 2017, 06:56

Elio: Thanks for sending this information. In fact, the data structure is not what I expected.

But I believe Joao's comment is relevant: The marginal mean of totscore5 exceeds its marginal variance, which implies that the conditional mean must exceed the conditional variance (the latter are what you are estimating) for whatever covariates x you include. So I think Joao is correct that the algorithm is trying to fit a NB model to underdispersed data, which should not work.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#13

31 Aug 2017, 07:27

I wonder what was the value of the Pearson statistic for dispersion. You may get this by typing - glm - command, such as:

Code:

glm totscore5 i.dn042_ i.agegr3 i.ch001_gr2 i.easy_isced1997_gr i.partner_mod2 i.iv009_rec_gr2 i.region, family(poisson) link(log) vce(cluster hhid6)

Last edited by Marcos Almeida; 31 Aug 2017, 07:37.

Best regards,

Marcos
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3015
#14

31 Aug 2017, 12:59

Dear Elio,

The NB regression assumes overdispersion and Stata forces the variance of the neglected heterogeneity (alpha) to be positive. When we have underdispersion, the algorithm tries to take alpha to zero, but that is impossible because the way Stata parameterizes it. Hence, the estimator is unlikely to converge. As John noted, you certainly have underdispersion, so the convergence you got is, in some sense, spurious.

Now, about your data: a) Are these numbers really counts, or is this just a variables codded as integers? Unless these are really counts, the overdispersion and related concepts are meaningless; b) If the variable cannot go above 8, models with an exponential mean such as Poisson and NB are unlikely to be adequate.

Best wishes,

Joao
Comment
Elio Bolliger

Join Date: Aug 2017

Posts: 9
#15

01 Sep 2017, 02:52

Dear John and Joao

Thanks a lot for your explanations I really appreciate it and I understand your explanations. To give you an idea what the variable totscore5 is, here some more details: The variable was coded regarding a block of eight question in a survey. The question was about testing the respondents knowledge. For each question, there was one correct answer, one wrong answer and one indicated with "I don't know". For each correct answer was coded like 1 point, each wrong answer or "I don't know" was coded as a zero leading to a total score ranging from 0 to 8.So, when you say this might not be a true count variable you mean this could easily also have been coded as a variable ranging from 0 to 1? However, regarding doctor visits which would be a true count, this is not possible. If this is the case I am sorry for not letting you know this essential information before. I was actually not aware that this could make a big difference.

I understand your point b), this would definitely be something I should have considered as well.

Thanks a lot for both of your explanations and advices. Probably one last point to Joao (if you have time to explain it to me): You mentioned if it is not a true count I can just modify the scale and get rid of the over-underdispersion. I was curious about how it would work. Are you referring to approaches like in a quasilikelihood approach where you account for under/overdispersion using a scaling factor?

Thanks again for your profound help and have a nice day!

Dear Marcos

Please find attached the output of the regression:

Best wishes,
Elio
Comment

Announcement

Poisson Model versus Negative Binomial Model with Equidispersion

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment