Hello all,
I am running a negative binomial panel regression, seeking to project vehicle deaths across countries, where each country has multiple years of data. Deaths are measured as a count outcome, and because death totals vary significantly across countries, population is used as an offset in the model.
I have two groups of countries, one for which there are reliable data in the outcome variable (deaths) (Group A), and another group for which unreliable data exists for this variable (Group B), and I want to predict the value of deaths in Group B using the coefficients generated from running the regression on Group A.
Here is the code I am using to run the model for Group A:
I then get a series of coefficients from the output. UCLA IDRE has helpful guidance on interpreting the negative binomial equation, given that the output variable is a logarithm. I am trying to create an equation from which I can use these coefficients to predict the outcome for Group B.
Based on this guidance and the coefficient output from the regression above, I developed this equation to be used for Group B:
However, when I test this equation with the Group A data, I am not getting predicted deaths values anywhere near the number of deaths reported, but roughly two orders of magnitude lower than what I would expect.
When I ran this model with the nbreg command instead of the xtnbreg command (not appropriate statistically since these are panel data, but simply for the purposes of testing my intuition), I got a series of coefficients which, when structured in the way above using exp, yielded levels of predicted deaths that were at the order of magnitude I was expecting.
I have searched many forums and manuals, but can't figure out what I'm doing wrong with regards to the interpretation of the xtnbreg coefficients so that they can be structured to create a predictive model. Any guidance would be appreciated.
Many thanks,
Sam
I am running a negative binomial panel regression, seeking to project vehicle deaths across countries, where each country has multiple years of data. Deaths are measured as a count outcome, and because death totals vary significantly across countries, population is used as an offset in the model.
I have two groups of countries, one for which there are reliable data in the outcome variable (deaths) (Group A), and another group for which unreliable data exists for this variable (Group B), and I want to predict the value of deaths in Group B using the coefficients generated from running the regression on Group A.
Here is the code I am using to run the model for Group A:
Code:
xtset locationid year xtnbreg deaths_reported lngdp_ppp lnvehicles_pc roaddensity shareworking, exposure(population_tho)
Code:
Random-effects negative binomial regression Number of obs = 326 Group variable: locationid Number of groups = 84 Random effects u_i ~ Beta Obs per group: min = 2 avg = 3.9 max = 4 Wald chi2(4) = 167.94 Log likelihood = -2497.6646 Prob > chi2 = 0.0000 ----------------------------------------------------------------------------------- deaths_reported_ | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------+---------------------------------------------------------------- lngdp_ppp | -.6080155 .0816019 -7.45 0.000 -.7679523 -.4480787 lnvehicles_pc | .2197999 .0889282 2.47 0.013 .0455039 .394096 roaddensity | .00216 .0045975 0.47 0.638 -.0068509 .011171 shareworking15_64 | 7.678704 .9253254 8.30 0.000 5.8651 9.492308 _cons | -14.92998 .6936033 -21.53 0.000 -16.28941 -13.57054 ln(population_~o) | 1 (exposure) ------------------+---------------------------------------------------------------- /ln_r | 1.53655 .1967986 1.150832 1.922268 /ln_s | 6.698765 .2328423 6.242403 7.155128 ------------------+---------------------------------------------------------------- r | 4.648526 .9148233 3.160822 6.836448 s | 811.4035 188.929 514.0924 1280.656 ----------------------------------------------------------------------------------- LR test vs. pooled: chibar2(01) = 564.95 Prob >= chibar2 = 0.000
Code:
gen deaths_pred=exp(-14.92) * exp(-.608*(lngdp_ppp)) * exp(.219*(lnvehicles_pc)) * /// exp(.002*(roaddensity)) * exp(7.67*(shareworking15_64)) * exp(lnpop_tho)
When I ran this model with the nbreg command instead of the xtnbreg command (not appropriate statistically since these are panel data, but simply for the purposes of testing my intuition), I got a series of coefficients which, when structured in the way above using exp, yielded levels of predicted deaths that were at the order of magnitude I was expecting.
I have searched many forums and manuals, but can't figure out what I'm doing wrong with regards to the interpretation of the xtnbreg coefficients so that they can be structured to create a predictive model. Any guidance would be appreciated.
Many thanks,
Sam
Comment