xtpoisson eliminates zeros for dependent count variable

tlo9966

Join Date: Jun 2014

Posts: 11
#1

xtpoisson eliminates zeros for dependent count variable

16 Jul 2023, 18:06

What am I missing? Why is xtpoisson dropping the observations in my panel analysis (states year) that are zeros? I am trying to predict the count of candidates that choose a particular jurisdiction for professional licensing. What model should I use so my zero observations are not eliminated? Or is there an option that I need to specify?
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#2

16 Jul 2023, 19:35

Why is xtpoisson dropping the observations in my panel analysis (states year) that are zeros?

I am confident that it is not, in fact, doing that.

What it probably is doing, assuming you are running a fixed-effects Poisson regression, is dropping any panels where the outcome variable is zero in every observation. The reason for that is that the mean parameter for a Poisson distribution that is all zero is, itself, zero. But the coefficient you are trying to estimate in a Poisson regression is the logarithm of that mean parameter, and the logarithm of zero is undefined (or, if you wish, informally, you can think of it as negative infinity). It isn't possible for the estimation to converge to negative infinity, so Stata avoids the problem by eliminating any all-zero panel.
Comment
tlo9966

Join Date: Jun 2014

Posts: 11
#3

16 Jul 2023, 19:47

Yes, I am running a fe Poisson regression, and you are correct, all jurisdictions with zero candidate in all years were dropped. So how can I run my regression?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#4

16 Jul 2023, 20:25

There are a couple of possibilities. It is the coefficient of the fixed effect of the all-zero-outcome panels that cannot be estimated. One possibility is to do a random-effects Poisson instead. Another possibility is to use a non-Poisson regression. Would you get any kind of reasonable fit to the data with an fixed-effects linear regression?

Another potential solution is: can you, using other variables in your model, "predict" which panels have all-zero-outcome? If so, then you can present a two-part model. One part identifies the all-zero outcome panels, and the other is a fixed-effects Poisson model fitted to the rest.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#5

16 Jul 2023, 21:02

To just add to Clyde's remarks: The FE Poisson estimator is doing exactly what it should based on the information available in the data. The mean has the form E[y(i,t)|x(i,t),c(i)] = c(i)*exp(x(i,t)*b) where c(i) >= 0 is the unobserved (or "fixed") effect. If for some i, y(i,t) = 0 for all t = 1 ,..., T, then, when c(i) is allowed to be arbitrarily correlated with x(i,t), unit i is uninformative for estimating b. In effect, c(i) is estimated to be zero. If for every period for unit i you've seen zeros, you best prediction the future is also zero. You should just let Stata drop these observations and use the ones that are informative about b.

Using the usual random effects Poisson will allow you to use all observations, but it has costs. For one, it requires c(i) to be uncorrelated with x(i,t). This can be overcome with a correlated random effects version of RE. But another problem is RE Poisson, unlike FE Poisson, requires all distributional assumptions to hold, including independence across t (conditional on c(i)). These are very strong assumptions.

A robust approach is to use correlated RE with pooled Poisson. With a balanced panel, it's easy. Let id be the cross-sectional identifier and z(i) any time-constant covariates. I'll assume year is the time variable.

Code:

egen x1bar = mean(x1), by(id) ... xKbar = mean(xK), by(id) poisson y x1 ... xK x1bar ... xKbar z1 ... zJ i.year, vce(cluster id)
2 likes
Comment
tlo9966

Join Date: Jun 2014

Posts: 11
#6

16 Jul 2023, 21:05

Performing the xtpoisson random effects panel regression results in an AIC and BIC that are worse than using xtnbreg - however, now I have significant coefficients that I didn't have with xtnbreg. Any thoughts there? As for the 2 part model, I do not know how to run that as a panel - zinb is what I used for the data with the pooled data, but not the panel - I don't know how to run the two part model with a panel. Any help there? I thoroughly appreciate your support and anything else you can share.
Comment
tlo9966

Join Date: Jun 2014

Posts: 11
#7

17 Jul 2023, 13:20

Clyde: Performing the xtpoisson random effects panel regression results in an AIC and BIC that are worse than using xtnbreg - however, now I have significant coefficients that I didn't have with xtnbreg. Any thoughts there? As for the 2 part model, I do not know how to run that as a panel - zinb is what I used for the data with the pooled data, but not the panel - I don't know how to run the two part model with a panel. Any help there? I thoroughly appreciate your support and anything else you can share.

Jeff: Thank you for the suggestion. Do you imagine that there is going to be an issue with the fact that most of my independent variables are dichotomous?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#8

17 Jul 2023, 15:03

Actually, when I referred to a two-part model, I didn't have -zinb- in mind, and it doesn't actually do what I did have in mind. -zinb- or -zip- estimate two part models where the first part is predicting individual observations with a zero outcome. But the issue here is not individual observations with zero outcome, it is whole panels with entirely zero outcome.

That said, as Jeff Wooldridge pointed out, not only are the parameters of those panels inestimable, those panels are also uninformative in -xtpoisson, fe-. So there is really no reason to go to a different model. You can stick with Poisson and just omit the all-zero panels. (You don't even have to actively do that yourself, as Stata will do it for you automatically, as you discovered.) And that will be a perfectly acceptable solution to your problem.
Comment
tlo9966

Join Date: Jun 2014

Posts: 11
#9

17 Jul 2023, 16:03

Clyde: Thank you for your help with this. I have found that running zinb demonstrates that I have both over dispersion and inflated zeros overall (27% zeros) - I identified a predictor (p<0.001) for the states with no candidates for the inflate function. All my independent variables (except 1) have the correct sign. I used vce(cluster id) where id is the state to correct for serial autocorrelation. This is also a better fit (in terms of AIC and BIC) than the pooled poisson but not better than my fe xtpoisson - but I feel that I am comparing apples and oranges because I lose 60 our of 265 observations with the fe xtpoisson model. I want to take advantage of the panel structure to account for unobserved heterogeneity and found Jeff tell someone to use "churdle" in a separate thread so that they estimate marginal effect that account for the two parts of the model. https://www.statalist.org/forums/for...-in-panel-data - any thoughts you have would be greatly appreciated.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#10

19 Jul 2023, 07:50

A few more comments. First, you shouldn't use things like AIC/BIC to choose between very robust procedures (Poisson fixed effects) and nonrobust procedures (essentially everything else). The Poisson estimator goes from very robust when using FE to not robust when using RE. That's just the way it is. If your main interest is in the effects on the mean, your best bet is Poisson FE. Every other method imposes stronger assumptions.

Are you trying to model the entire distribution? If so, then you can use a two-part approach combined with the correlated random effects approach. I would avoid joint MLEs that impose strong distributional and independence assumptions.
Comment

Announcement

xtpoisson eliminates zeros for dependent count variable

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment