GLM Regression Family

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#16

16 Jul 2025, 16:04

An important feature of the GLM framework is that it only uses distributions from the linear exponential family, so that only the conditional mean function has to be correctly specified to consistently estimate those parameters. The Tweedie family is not in the LEF, and neither is the gamma if you estimate the shape parameter along with the mean parameter. But glm actually uses the exponential distribution. As Andrew notes, the key is to use a good conditional mean function. If you outcome is nonnegative -- it should be if the outcome variable is cost -- then using either Poisson regression or gamma is a good idea. Robust standard errors should be used. If the variance is closer to being proportional to the mean, Poisson regression will tend to be efficient. If the variance is proportional to the square of the mean, gamma regression would tend to be more efficient. It's a good idea to try both because they're both consistent if the mean is correctly specified, so you hope to find similar estimates. You can compare robust standard errors across estimators.
3 likes
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#17

17 Jul 2025, 00:12

Helen:
I would stick with the Gamma distribution, unless the longest tail of your cost distribution is the left one. If that were the case, I would try the Gaussian distribution or a log-linear regression model and see what happens to my coefficients.
The Poisson distribution works with continuous variables too, but, as you say, is not frequent in cost analysis.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Helen So

Join Date: Jul 2025

Posts: 5
#18

17 Jul 2025, 14:28

Thank you so much for all your suggestions!
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2189
#19

18 Jul 2025, 22:54

One way to think about Poisson regression is that it's a very convenient way of performing nonlinear weighted least squares, where the nominal assumption on the conditional variance is a constant multiple of the conditional mean. It does not have to equal the mean. Using the exponential distribution (gamma with fixed shape) has a similar interpretation, except now the variance is is nominally proportional to the square of the mean. Even for non-count variables it is not clear which one is closer to being true. The beauty of the LEF distribution is that neither has to be true for consistency of any of the quasi-MLEs. We just should use robust inference.
2 likes
Comment

Announcement

Comment

Comment

Comment

Comment