Testin overdispersion in Negative Binomial

Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#1

Testin overdispersion in Negative Binomial

03 Mar 2020, 03:20

Hi all,

is there a way to test the presence of overdispersion in a panel negative binomial model? I know that Fixed Effects Negative Binomial provided through the command xtnbreg should eliminate overdispersion parameter delta_i (as in the help), hence I guess that neither the postestimation commands nor the estimated parameters could provide a test for overdispersion in such case.

Thank you,

Federico
Tags: count, count data, negative binomial
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#2

03 Mar 2020, 04:26

Dear Federico Nutarelli

Be careful with the NBFE estimator, see:

Guimarães, Paulo, 2008. "The fixed effects negative binomial model revisited," Economics Letters, 99(1), 63-66.

Best wishes,

Joao
2 likes
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#3

03 Mar 2020, 07:13

Thank you very much Joao Santos Silva
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2187
#4

03 Mar 2020, 15:37

I agree with Joao. I’d say it even more strongly: you should use the FE Poisson approach and never use FENB. The former is entirely robust, the latter is fragile to violations of assumptions.
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#5

04 Mar 2020, 03:49

Dear Professor Jeff Wooldridge ,

many thanks for your appreciated and useful reply.
The problem is that my data are overdispersed. I read some of your interesting past interventions and also Cameron Trivedi book about the topic. It seems that FE Negative Binomial is also not able to eliminate fixed effects. If I may take some of your time, I would like to ask you if is it possible to correct for overdispersion with FE Poisson.

Many thanks,

Federico
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2187
#6

08 Mar 2020, 07:21

Federico: A few points.

1. Because the FE Poisson estimator is fully robust to any kind of variance-mean relationship, there is no need to "correct" for overdispersion with FEP. You do need to compute robust standard errors. Fully robust means that the conditional mean needs to be correct, and that's all.
2. You can't tell by looking at the raw data, or even the data conditional on x, whether overdispersion holds in the sense relevant for panel data. You would have to observe the heterogeneity (or estimate it, which is difficult with small T). It seems likely that some units are undispersed and some over, once you control for what I call ci along with xi. Being overdispersed across the entire population is not the same as being overdispersed for each unit.
3. What would you do if you concluded you have overdispersion? There are, perhaps, more efficient method of moments estimators, as I discuss in my 1999 Journal of Econometrics paper. But you should not use FENB. The estimator is very fragile to violation of a set of assumptions that are too strong.
4. One way to emphasize point 3: if the FENB assumptions hold, FEP is consistent. If the FEP assumptions hold, FENB is inconsistent.

JW
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#7

09 Mar 2020, 03:30

Jeff Wooldridge
Dear Professor,

thank you for the complete and very clear explanations. I will proceed as you suggest.

Many thanks,

Federico
Comment
Samuel Nocito

Join Date: Apr 2020

Posts: 12
#8

08 Apr 2020, 09:15

Dear Professors Joao Santos Silva and Jeff Wooldridge,

I found your replies and explanations really interesting and instructive! Actually, I have a similar issue and I would like to be sure that using FE Poisson (estimated with ppmlhdfe command) I'm obtaining robust and consistent estimates. In what follows I give more details about my research. I would be extremely grateful to have any suggestions from you. Thanks in advance!

I'm interested in estimating the effect of a treatment, measured using a dichotomic variable, on the number of tourists visiting a municipality. I have a balanced sample of countries (origin country of the tourists), Italian municipalities (destination of the travel) and year-months. In particular, I have a sample of 46 countries, 390 municipalities over the period 2000-2017, for a total of 3,875,040 observations.

I'm estimating the following equation:

ppmlhdfe log_n_visits treat , absorb(country_mun year_month) cluster(municipality) d
margins, dydx(treat) post

where log_n_visits measures the number of tourists from country C visiting municipality M in year-month T (in logs: i.e., log(visits +1)), treat is the dummy variable for the treatment that takes value one for treated countries and municipalities from a particular point in time onwards; the point in time varies across countries (i.e., some countries receive the treatment before other treated countries). Finally, I include FE on the intersection between country and municipality (country_mun) and between year and month (year_month), I cluster SE at the municipality level. Notice that estimating this equation, the number of observations falls to 551,448 because ppmlhdfe drops 3323592 observations that are either singletons or separated by a fixed effect. FE count for 2553 country_mun, 216 year_month, and I have 350 municipality clusters.

I'm using survey data from the Bank of Italy that are representative of the tourism flow at the national level. However, by construction, the outcome variable has many zeros and I'm worried about the issue of overdispersion when I estimate FE Poisson. I include summary statistics of the outcome (linear) both on the whole sample and on the e(sample) generated by FE Poisson (ppmlhdfe), respectively:

These are my questions:
Is overdispersion a concern when using FE poisson (ppmlhdfe)? In particular, I am wondering whether your statement "FE Poisson estimator is fully robust to any kind of variance-mean relationship" suggests that also in my setting the FE poisson (ppmlhdfe) provides consistent and robust estimates. If yes, I would be grateful if can you suggest me some references that can I study and use to support the use of ppmlfhde instead of a Negative Binomial model in the presence of FE.

Do you think that given the large number of zeros in my dataset, shall I try to implement a zero-inflated poisson model with FE? If so, is there a way to implement it in stata?

Is it preferable to use the linear count variable on the number of visits instead of its log transformation? Or using the log transformation is fine?

Thank you very much for your time and your help!

Best,
Samuel
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2187
#9

08 Apr 2020, 09:24

Samuel: You apply Poisson FE directly to visits. Do not use log(visits + 1)! And you have to be careful using margins because those depend on the estimated fixed effects. The coefficients themselves are usually of interest: multiplied by 100, they give percentage effects on the mean visits given a one unit increase in x.
1 like
Comment
Samuel Nocito

Join Date: Apr 2020

Posts: 12
#10

08 Apr 2020, 10:42

Dear Professor Jeff Wooldridge,

thank you very much for your useful reply! So, do you think that I can trust on FE Poisson because "FE Poisson estimator is fully robust to any kind of variance-mean relationship"? Should I concern about a large number of zeros? Finally, can you suggest me some references, please?

Sorry if I'm bothering you with so many questions? Thank you very much in advance for your time, your suggestions are really precious for me!

I wish you all the best in this hard time at the global level!
Best,
Samuel
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#11

09 Apr 2020, 09:41

Dear Samuel Nocito,

The reference you need is:

Wooldridge, J.M., (1999) “Distribution-Free Estimation of Some Nonlinear Panel Data Models,” Journal of Econometrics 90, 77–97.

Best wishes and stay safe,

Joao
Comment
Samuel Nocito

Join Date: Apr 2020

Posts: 12
#12

09 Apr 2020, 09:56

Dear Professor Joao Santos Silva,

thank you very much for your reply and the suggested reference, I really appreciate!

Best wishes and stay safe you too,
Samuel
Comment
Dijana Zejcirovic

Join Date: Apr 2020

Posts: 2
#13

10 Apr 2020, 06:42

Hi all,

Thanks for all the info provided in this thread, learned a lot!

I am facing a very similar problem as Samuel Nocito. I work with panel data (N=8091 municipalities, T= 10 years) and my dependent variable is a count variable with many, many zeros. The dependent variable is the number of hospitalizations (for specific diseases) in a municipality.

I would like to run a (high-dimensional) FE zero-inflated Poisson regression (municipality and year fixed-effects) and I am looking for ways to implement this in Stata.

Is there a direct way or a two-step procedure to implement this? Our assumption is that we observe zero hospitalizations in small municipalities (low population counts).

Many thanks in advance and all the best,
Dijana
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3018
#14

10 Apr 2020, 09:11

Dear Dijana,

A zero inflated model will be adequate if for some municipalities it is not possible to have hospitalizations due to a specific disease (for example, because there is no hospital in the municipality). If that is not the case, you can just use Poisson regression with FE, possibly controlling for the population in each municipality (but the FE are likely to be enough to account for differences in population).

Best wishes,

Joao
2 likes
Comment
Dijana Zejcirovic

Join Date: Apr 2020

Posts: 2
#15

11 Apr 2020, 11:55

Dear Joao,

Many thanks for your quick and helpful reply.
I guess I will then just go for a Poisson regression with FE.

Best,
Dijana
Comment

Announcement

Testin overdispersion in Negative Binomial

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment