Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • poisson and negative binomial goodness of fit query using estat gof

    Hi all,
    I have a large dataset over 200,000 and I'm looking at count data so I'm considering poisson and negative binomial models. For postestimation model diagnostics I have read 'estat gof' in Stata manual 13 can be used but I am only able to get it to work with poisson and not negative binomial (it says invalid subcommand gof in Stata 13.1). I have lots of zeros in the data but I'm looking at Hospital admissions so there is no reason why a person would not enter hospital for the condition I'm looking at.
    When I run the estat gof for a univariate poisson model I get the following, I've run several models with different dependent and independent variables where in other modelling I know relationships exist but get similar results with estat gof:

    Deviance goodness of fit = 180000.1
    Prob > chi2 (199,996) = 1.000

    Pearson goodness of fit = 440000.2
    Prob > chi2 (199,996) = 0.000

    (numbers only approx., but p values are what is in the output)

    The manual says both should be non-significant and goes on to model interaction and combines categories. I've also run with a two category independent variable and get similar results.
    Has anyone else had an issue with this? Shall I ignore and just try other diagnostics? Do you suggest any in particular from experience on large live datasets?

    many thanks,
    Annette





  • #2
    In the real world, it is highly unlikely that either the Poisson or negative binomial models are a true specification of the actual data generating process. They may, nevertheless, be very useful approximations. The problem with these chi square statistics is that in a sample of 200,000 you have enormous statistical power to detect trivial departures of your data from these parametric models. In fact, with that much data, it will be nearly impossible to find a model, other than the saturated one, that will not show a statistically significant result on these tests.

    I think you would be better off just looking at graphs or cross-tabulations of predicted vs observed numbers of events and deciding whether the model is close enough to the data to be useful for whatever it is intended to do. You may find, for example, that the model fits well when it predicts a medium number of events but not so well when low or high numbers are predicted, or some other pattern. Those patterns can be useful clues to omitted variables or interactions or other mis-specifications of the predictors.

    Comment


    • #3
      Annette,
      as an aside to Clyde's helpfu insight, I would also focus on your zeros.
      I do not know in which health care system your data were collected, but it might well be that patients with zero hospitalization should have been hopsitalized because of their poor health state, but couldn't due the lack of insurance coverage.
      If this is your scenario, you may want to take a look at hurdle model, covered for instance in Chapter 17 (especially pages 583-589) of Cameron AC, Trivedi PK. Microeconometrics using Stata. Revised Edition. College Station, TX: Stata Press, 2009 (a very valuable textbook, in my opinion).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Hello, Annette,

        We have very few information from your model before the gof tests. I don't know if it was a typo, but both gof tests had the same chi-square (199), but p = 1 in the deviance gof and p = 0 for the Pearson gof. If it is not a typo, I cannot understand your output. After all, which one to consider? It seems preposterous to me.

        But if you really got such extreme results - no matter how perplexed I am -, perhaps you may want to read an excerpt from a very interesting book on the matter (Hilbe, Modeling Count Data, Cambridge, 2014), where the author presents pitfalls related to above-mentioned gof tests in count models.

        "[...] since it was first proposed, statisticians discovered that many models appearing to be well fitted on the basis of the deviance test in fact poorly fit the data".

        And a few paragraphs after:

        "We will not consider using the Pearson Chi2 statistic for a GOF test, though. It appears to produce biased results".

        Only to make it clear, the comments are on Pearson and Deviance GOF tests, nota bene: not the Deviance and Pearson statistic we usually see in the beginning of the output in Poisson regression, for example.

        Now changing slightly the subject: I wonder if you have a precise "theory" on what is going on with the zero values. I mean, do you think the people with zero counts come from a different "sampling" or pattern or source? If so, I gather Carlo's suggestion - a hurdle model - will perform at its best.

        If not, perhaps you should add to your "list" of options a zero-inflated Poisson or, in case of a high degree of overdispersion, a zero-inflated Negative Binomial mode.


        Hopefully that was of some help.

        Marcos
        Best regards,

        Marcos

        Comment

        Working...
        X