Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • negative binomial regression

    I have a dataset with which I would like to investigate the number of cyclists at different counting points due to a special promotion in public transport. As I understand the data to be count data and there is a large overdispersion, I wanted to perform a panel regression with the negative binomial model. The variable of interest is ticket, which is a dummy with 0 for the period before the promotion and a 1 for the promotion period. According to my understanding, I would have to interpret the coefficient in such a way that the logarithmised number of cyclists decreases by 0.34 when the period is present.

    . xtnbreg sum i.Ticket Schnittdiesel sun rain wind temperature i.StationsNR

    Fitting negative binomial (constant dispersion) model:

    Iteration 0: log likelihood = -6633497.9
    Iteration 1: log likelihood = -6026813.8
    Iteration 2: log likelihood = -6025785.1
    Iteration 3: log likelihood = -6025785

    Iteration 0: log likelihood = -8216888
    Iteration 1: log likelihood = -4363919.9
    Iteration 2: log likelihood = -2063921 (backed up)
    Iteration 3: log likelihood = -1061753.8 (backed up)
    Iteration 4: log likelihood = -1008305.1
    Iteration 5: log likelihood = -1008299.6
    Iteration 6: log likelihood = -1008299.6

    Iteration 0: log likelihood = -1008299.6
    Iteration 1: log likelihood = -982686.72
    Iteration 2: log likelihood = -941802.38
    Iteration 3: log likelihood = -933734.8
    Iteration 4: log likelihood = -933575.23
    Iteration 5: log likelihood = -933575.1
    Iteration 6: log likelihood = -933575.1

    Fitting full model:

    Iteration 0: log likelihood = -3487249 (not concave)
    Iteration 1: log likelihood = -2249010.9 (not concave)
    Iteration 2: log likelihood = -1554159.4 (not concave)
    Iteration 3: log likelihood = -1228861.2 (not concave)
    Iteration 4: log likelihood = -1108431.4
    Iteration 5: log likelihood = -1025290.4 (backed up)
    Iteration 6: log likelihood = -978874.35 (not concave)
    Iteration 7: log likelihood = -925065.8
    Iteration 8: log likelihood = -916588.67
    Iteration 9: log likelihood = -916437.5
    Iteration 10: log likelihood = -916431.93
    Iteration 11: log likelihood = -916431.78
    Iteration 12: log likelihood = -916431.78

    Random-effects negative binomial regression Number of obs = 178,020
    Group variable: StationsNR Number of groups = 25

    Random effects u_i ~ Beta Obs per group:
    min = 6,408
    avg = 7,120.8
    max = 7,470

    Wald chi2(32) = 126589.29
    Log likelihood = -916431.78 Prob > chi2 = 0.0000

    -------------------------------------------------------------------------------
    sum | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    1.Ticket | -.3473887 .0052404 -66.29 0.000 -.3576597 -.3371178
    ...
    --------------+----------------------------------------------------------------
    /ln_r | .5948828 .261183 .0829736 1.106792
    /ln_s | 4.527904 .2997108 3.940482 5.115326
    --------------+----------------------------------------------------------------
    r | 1.812818 .4734773 1.086513 3.02464
    s | 92.56434 27.74253 51.44337 166.5551
    -------------------------------------------------------------------------------
    LR test vs. pooled: chibar2(01) = 3.4e+04 Prob >= chibar2 = 0.000

    . margins Ticket

    Predictive margins Number of obs = 178,020
    Model VCE : OIM

    Expression : Linear prediction, predict()

    ------------------------------------------------------------------------------
    | Delta-method
    | Margin Std. Err. z P>|z| [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    Ticket |
    0 | .2020822 .003624 55.76 0.000 .1949792 .2091852
    1 | -.1453065 .005003 -29.04 0.000 -.1551123 -.1355008
    ------------------------------------------------------------------------------


    Can I somehow transform the coefficients so that I get a result that can be directly interpreted as cyclists? It doesn't work with the margins command, because then I always have to fall back on i.Ticket and the values remain very small. From an OLS panel estimate I would expect the coefficient to be between 30 and 40. Can anyone help me here? Many thanks in advance.
    Last edited by Tom Berger; 07 Oct 2022, 07:15.

  • #2
    As a slight sidenote, you may want to resort to the community contributed command ppmlhdfe with robust / clustered standard errors.

    xtnbreg is extremely controversial...

    And ppmlhdfe does everything you want, as overdispersion will not thwart inference nor bias your coefficient. You can check Santos Silva and Tenreyro (2011).

    Comment


    • #3
      A few things. As noted by Maxence, the NB is not robust to the very specific variance-mean relationship. It's also not robust to serial correlation. If you really used pooled OLS, just just pooled Poisson. Then, you can use the margins command to get the effect on the expected count itself, rather than a percentage effect. If you used fixed effects to estimate a linear model then you should use fixed effects Poisson. But then the marginal effects calculation gets more difficult.

      For pooled Poisson to compare with pooled OLS:

      Code:
      poisson sum i.Ticket Schnittdiesel sun rain wind temperature i.StationsNR, vce(cluster id)
      margins, dydx(Ticket)

      Comment


      • #4
        Thank you very much for the tips.
        I have dealt with the ppmlhdfe command and have run the regression again. With this I can display the nominal margins. It also works with the robust poisson regression. Thank you for the two commands. However, my concern with this is that the test for overdispersion also indicates that overdispersion is present in this case. Thus, from what I know, this should be a problem. From their experience, is this really a case of a problem militating against the poisson estimate?
        Since it is a panel, I think it is necessary to include fixed effects because there is some unobserved heterogeneity. In my regression equation this is represented by i.StationsNR to be able to look at the individual effects. Also, I'm not sure if the two methods don't introduce bias by linking the data over time. Unfortunately, I have never worked with count data before, which is why I am very unsure and welcome any advice.

        Comment

        Working...
        X