Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • cluster option with negative binomial regression

    Hi there

    Sorry this maybe more of a stats question than a Stata question, but my understanding of adding a cluster option to a regression model is that the standard error will increase which will widen the confidence intervals but I'm surprised to see that, in my case, adding a cluster option actually narrows the confidence intervals:

    glm count i.exposure i.covar1 i.covar2, family(nbinomial) link(log) eform

    glm count i.exposure i.covar1 i.covar2, family(nbinomial) link(log) eform cluster(id)

    Number of observations = 4,069,451
    Number of unique IDs = 3,423,928

    I'd be very grateful to anyone who might be able to offer some insight as to what is happening here.

    With thanks

  • #2
    this is a FAQ - see https://www.stata.com/support/faqs/s...ion/index.html

    Comment


    • #3
      I'll add a few comments/suggestions. First, what happens if you use the vce(robust) option? What might be happening is that the usual standard errors are computed under the assumption of overdispersion of a specific type. It could be that the assumption is wrong, and that the true variance-mean relationship is more complicated. The reason I think this is happening is because you have relatively few repeated observations and so the clustering is having very little effect. Using vce(robust) would shed considerable light on this.

      Now that I refresh my memory, I believe your options imply you're doing geometric regression, which implies variance = mu*(1 + mu), which is very restrictive and implies a lot of overdispersion. (This doesn't affect consistency of the estimated mean parameters but it can greatly affect standard errors.) So, I'm guessing the vce(robust) option also will bring the standard errors down by a lot (and rightly so).

      I would much prefer using Poisson regression in your setting. It can be efficient with under- or overdispersion. It's fully robust to misspecification of the variance; for some reason, many still think Poisson regression requires variance = mean. At a minimum, it's a useful check with your other results. Geometric regression is not standard in my field and, like I said, taken literally the variance-mean relationship is often unrealistic.

      Comment


      • #4
        Thanks very much, both, for the helpful replies.

        I need to dig deeper into this but, in case anyone is interested, the table below shows the width of the confidence interval when applying different methods.
        nbinomial poisson SE type
        0.016174 0.014419 default
        0.01285 0.01285 vce(robust)
        0.012418 0.012418 cluster(id)
        Unsurprisingly, the confidence interval when using nbinomial is wider than when using poisson.
        Both poisson and nbinomial produce identical confidence intervals when using vce(robust) or cluster(id).
        Applying vce(robust) gives narrower confidence intervals, i.e. the robust variance estimate is smaller than the OLS estimate.
        Applying cluster(id) gives the narrowest confidence intervals, which implies that the intracluster correlations are negative, which would make sense in the context of my study.

        Comment

        Working...
        X