Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative binomial regression

    Is there a way to calculate ICC for negative binomial regression? I conducted a study to identify factors associated with a given outcome variable which has a count data nature. My data has hierachiacal nature. Then, to account for the random effects, I first fitted the random effect only model, and the result of log likelyhood test showed that there is no enough variability between the random intercept of the higher level variable to consider mixed effect. I ignored it and fitted the full model with fixed and random effect. In the final model, the confidence interval of the variance of the random intercept showed that it is statistically significant; however, log likelyhood test again indicated no enough variability between the variance of the random intercept to consider mixed effect. Then, I ignored the levels and fitted a negative binomial regression. In the negative binomial regression, the loglikelyhood test showed that the overdispersion is statistically significant, and the AIC of the negative binomial regresssion model is lower than the AIC of the mixed effect negative binomial regression. Should I opt for the negative binomial regression? I have read some articles where ICC is calculated for the negative binomial regression. Is it possible to do it in Stata?

  • #2
    I am not a fan of using statistical tests to select among models. Actually, I prefer looking at how the model fits the data, especially doing so graphically. But if you are going to use a statistic to make your choice, the AIC is, in my view, better than any test, because it least it penalizes a model that overfits the noise in the data by stuffing it with unneeded parameters. But, again, my personal approach is to inspect plots of fitted vs observed outcome values and scatterplots of residuals vs predictors or residuals vs fitted values to see if the residual distributions look OK.

    Also, in this case the choice is almost moot: if, in fact, the residual variance at the upper level is close to zero, then the fixed-effects part of the mixed-model will look almost the same as a single-level model with the same variables.

    As for the ICC, the negative binomial model is inherently heteroskedastic; there is no constant variance for the residual. Rather the residual variance for an observation is a quadratic function of the expected value. Consequently there is, in principle, no such thing as an ICC for this model (nor for the Poisson). If you have seen some articles where an ICC is calculated for this model, it must be some sort of pseudo-ICC, not the real thing.
    Last edited by Clyde Schechter; 04 Dec 2023, 20:50.

    Comment


    • #3
      Thank you very much Schechter!

      Comment


      • #4
        Dear Stata forum members,
        I sincerely need your help! I am doing my PhD research on malaria incidence and its predictors. To execute the study, I want to recruit all age groups by assessing their baseline characteristics, and follow them for a year. My population are dynamic population where immigration and emigration is high due to the closeness of the study area to border where people influx in and move out. Therefore, what I am intending to measure is incidence density rather than a cumulative incidence. I could not get a study which followed similar design. Other studies done in my country and elsewhere recruited only febrile patients as the study population where the incidence is more likely. Therefore, my question is what is the minimum sample size (person-year) that I should follow for the study to have sufficient power and yield valid result? What is the sample size calculation formula? Can I calculate the sample size using Stata? To be frank, I tried to calculate the sample size by considering the incidence rate as proportion, which is actually not, and the sample size I got became by far less than what I expected. Help!

        Comment


        • #5
          Dear Stata experts,

          I am undertaking a study assessing the number of hospital clinics (liver clinics in particular) for two separate liver conditions (FLD vs HBV), with variable follow-up time for each participant.

          Whilst I initially undertook a Poisson regression model to establish predictors (including FLD vs HBV) for number of liver clinics over follow up time, I consulted with a university statistician who recommended I also run a negative binomial regression model as most datasets violates the assumption that the variance is equal to the mean.

          However, when I try to undertake -nbreg- the results I get are identical to the Poisson model. See relevant screenshots attached.

          Thanks in advance for your reply
          Attached Files

          Comment


          • #6
            Karl:
            1) an excess of categorical predictors may harm convergence;
            2) according to experts' opinion on this forum, -poisson- with -vce(cluster clusterid)- standard errors outperforms -nbreg- in case of overdispersion;
            3) I would replace -c.age- with -c.age##c.age-.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Dear Karl Vaz,

              In addition to Carlo's helpful comments, note that the estimate of alpha is going to zero (lnalpha going to minus infinity) and therefore there can be no convergence. This suggests that with this set of controls there is no overdispersion at all. Of course, if you drop some regressors, you may find overdispersion, but I would still go for Poisson, unless you need to be able to compute the probability of some events.

              Best wishes,

              Joao

              Comment


              • #8
                In addition to what Carlo Lazzaro and Joao Santos Silva have offered—with which I agree—I would add that the variance=mean result refers to the conditional variance and conditional mean, V(y|x) and E[y|x], not the marginal variance and marginal mean. This is often a point of confusion in discussion of negative binomial models.

                Comment


                • #9
                  Thank you very much Carlo Lazzaro , Joao Santos Silva and John Mullahy , that is very very useful. Can I ask the rationale for replacing -c.age- with -c.age##c.age- and using -vce(cluster clustered)- vs say -vce(robust)- ?

                  Comment


                  • #10
                    Karl:
                    1) adding the sq -age- may increase the goodness of fit of your regression;
                    2) the -robust- option deals with heteroskedasticity only.
                    Kind regards,
                    Carlo
                    (Stata 19.0)

                    Comment


                    • #11
                      Thanks again Carlo Lazzaro

                      Comment

                      Working...
                      X