Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpretation of xtnbreg Coefficients

    Hello all,

    I am running a negative binomial panel regression, seeking to project vehicle deaths across countries, where each country has multiple years of data. Deaths are measured as a count outcome, and because death totals vary significantly across countries, population is used as an offset in the model.

    I have two groups of countries, one for which there are reliable data in the outcome variable (deaths) (Group A), and another group for which unreliable data exists for this variable (Group B), and I want to predict the value of deaths in Group B using the coefficients generated from running the regression on Group A.

    Here is the code I am using to run the model for Group A:

    Code:
    xtset locationid year
    xtnbreg deaths_reported lngdp_ppp lnvehicles_pc roaddensity shareworking, exposure(population_tho)
    I then get a series of coefficients from the output. UCLA IDRE has helpful guidance on interpreting the negative binomial equation, given that the output variable is a logarithm. I am trying to create an equation from which I can use these coefficients to predict the outcome for Group B.

    Code:
    Random-effects negative binomial regression     Number of obs     =        326
    Group variable: locationid                      Number of groups  =         84
    
    Random effects u_i ~ Beta                       Obs per group:
                                                                  min =          2
                                                                  avg =        3.9
                                                                  max =          4
    
                                                    Wald chi2(4)      =     167.94
    Log likelihood  = -2497.6646                    Prob > chi2       =     0.0000
    
    -----------------------------------------------------------------------------------
     deaths_reported_ |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ------------------+----------------------------------------------------------------
            lngdp_ppp |  -.6080155   .0816019    -7.45   0.000    -.7679523   -.4480787
        lnvehicles_pc |   .2197999   .0889282     2.47   0.013     .0455039     .394096
          roaddensity |     .00216   .0045975     0.47   0.638    -.0068509     .011171
    shareworking15_64 |   7.678704   .9253254     8.30   0.000       5.8651    9.492308
                _cons |  -14.92998   .6936033   -21.53   0.000    -16.28941   -13.57054
    ln(population_~o) |          1  (exposure)
    ------------------+----------------------------------------------------------------
                /ln_r |    1.53655   .1967986                      1.150832    1.922268
                /ln_s |   6.698765   .2328423                      6.242403    7.155128
    ------------------+----------------------------------------------------------------
                    r |   4.648526   .9148233                      3.160822    6.836448
                    s |   811.4035    188.929                      514.0924    1280.656
    -----------------------------------------------------------------------------------
    LR test vs. pooled: chibar2(01) = 564.95               Prob >= chibar2 = 0.000
    Based on this guidance and the coefficient output from the regression above, I developed this equation to be used for Group B:
    Code:
     gen deaths_pred=exp(-14.92) * exp(-.608*(lngdp_ppp)) * exp(.219*(lnvehicles_pc)) * ///
    exp(.002*(roaddensity)) * exp(7.67*(shareworking15_64)) * exp(lnpop_tho)
    However, when I test this equation with the Group A data, I am not getting predicted deaths values anywhere near the number of deaths reported, but roughly two orders of magnitude lower than what I would expect.

    When I ran this model with the nbreg command instead of the xtnbreg command (not appropriate statistically since these are panel data, but simply for the purposes of testing my intuition), I got a series of coefficients which, when structured in the way above using exp, yielded levels of predicted deaths that were at the order of magnitude I was expecting.

    I have searched many forums and manuals, but can't figure out what I'm doing wrong with regards to the interpretation of the xtnbreg coefficients so that they can be structured to create a predictive model. Any guidance would be appreciated.

    Many thanks,
    Sam

  • #2
    I've had this experience and asked Stata tech support about it. They said they had the right equation, but they saw it gave strange predictions. It is not clear what is going on. You might try moving to xtpoisson and see what happens.

    Comment


    • #3
      Phil,

      Thank you kindly for your reply.

      I was using xtnbreg because I was trying to replicate an analysis that specifically used negative binomial regression (and such a model is more statistically appropriate given the overdispersion in my data). Nevertheless, I tried re-running with xtpoisson, re and this yielded outcomes that were at the appropriate level of magnitude. However, when re-running it as xtpoission, fe (again, more statistically appropriate given my data), the lack of a constant term results in yielding estimates that are much greater than would be expected. For my purposes, running in xtpoisson, re should yield estimates that are reasonably precise for the time being. Nonetheless, I'd like to know what's going on.

      I feel that I'm misinterpreting these coefficients in some way so that they can be used in a predictive model. i.e, the coefficients provided by xtnbreg cannot be interpreted in the same way as those from nbreg. Whereas nbreg coefficients seem to yield appropriate predictions with the equation outcome=exp(Intercept) * exp(b1(x1)) * exp(b2(x2)) that clearly isn't working for xtnbreg. What am I missing here?

      Many thanks,
      Sam

      Comment

      Working...
      X