Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Absolute incidence rates from Poisson model for survival analysis

    Thanks in advance for any help. Stata 14.2.
    I am looking at the effect of predictors (age, sex, race, blood glucose, etc.) for the outcome of insulin use (binary, non-recurrent).
    Some of the predictors (like glucose) are time-varying.
    I have data in long format, organized by year, for 10 years of follow-up. Data example:
    ID year insulin_use age sex glucose white_race
    1 0 0 45 1 103 0
    1 1 0 45 1 184 0
    1 2 1 45 1 172 0
    1 3 1 45 1 182 0
    2 0 0 62 0 104 1
    2 1 1 62 0 112 1
    2 2 1 62 0 107 1
    etc.

    My primary analysis is a cox model. However, I now want to evaluate absolute incidence rates, and additive interactions.
    I decided to do this with a poisson model that is trying to recapitulate the cox model as closely as possible. example code:

    poisson insulin_use ibn.year age sex glucose white_race, irr

    this works nicely, but now I would like to use this model to predict the absolute incidence rates in strata of predictors, adjusted for other predictors.
    For example, I would like to say that the predicted incidence rate for females, adjusted for age, glucose, and race, is 5.6 events / 100 person years.
    Then I would like to look at interactions with race, etc.
    I have seen folks use the marginal command for this sort of thing, but not using data in the long format like this, and i'm wondering exactly how to set it up to give correct absolute incidence rate estimates.
    Thanks for any guidance.

  • #2
    Interesting question. First of all, I'm not sure that a Poisson model is theoretically appropriate here, but if you want to use it, you have your data set up wrong. I think you would need to run a longitudinal model. Using generalized estimating equations:

    Code:
    xtset ID year
    xtgee  insulin_use year age sex glucose white_race, family(poisson) corr(exch) eform
    That would give you correct statistical inference, ignoring the fact that insulin use isn't a real count. The eform option exponentiates the coefficients, and is equivalent to requesting the IRR option in this context. And after that, margins will indeed work correctly. You would get the predicted number of events per year. I see you included year as categorical; I think actually you are justified in treating it as continuous, and after that, margins, dydx(year) should work.

    Problem is, the model is estimating the count of the number of incidents of insulin use, on the assumption that a person can have more than one a year.

    I am struggling to think how you can use a statistical model to estimate the exact quantity you want. I know we say, for example, the incidence of diabetes is x cases per 100 person-years, even though diabetes is binary in that context. Would not the median time to insulin use, which I believe you can calculate from the Cox model, be as substantively informative as an incidence rate?

    And, question for others: say Scott were to use xtlogit or melogit, and obtain margins, dydx(year). That's the change in probability of using insulin with a 1-unit increase in year. Does that get him what he needs? (Seeing as he can convert that probability, I think, to incidents per 100 person years??)
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you very much Weiwen -- I initially used the simple Poisson regression because the survival analysis stata documentation had said that that would approximate the cox model, see page 139 on http://www.stata.com/manuals13/st.pdf It did in fact give me very comparable estimates to my cox analysis, but you're right, margins didn't work correctly.
      I tried xtgee as you suggested, which gave me very similar results as the simple Poisson model and also gave me reasonable estimates for incidence rates using margins.
      However I agree with you, I'm going to have to think carefully about how the incidence rates can be interpreted in this context.
      I'd also like to hear what people think about xtlogit or melogit.

      Comment


      • #4
        Scott, very interesting to hear that. So, perhaps I was a bit hasty in what I said about Poisson. The same survival documentation also noted that you can use xtlogit to parameterize a discrete time survival model. And that is what you have, sort of, because your observations are annual. You could elect to treat time as discrete. I would strongly argue for some sort of frailty model here (aka a random intercept/mixed model) to account for the fact that you're following individuals. Whatever the case, the information you need is in the section on discrete time analysis. I believe you have to do some additional work to parameterize your survival function, e.g. For a Weibull model, t has to be log t, or apparently for a Cox model you have to use the time variable as in indicator variable (I think!!! You'd better check that if you plan to do that.).

        after that, you are able to predict the individual hazard rates and the survival function at each unit of time. I am thinking you can do the math to get the incidence rate you want, but I haven't walked through the steps, so I am not absolutely certain. Hope others can chime in as to whether or not this is correct, because survival analysis is not my strongest suit.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Scott: you appear to have discrete time survival time data, in which case you might be interested in the resources at http://Survival Analysis Using Stata...vival-analysis, especially the Lessons (with example Stata code for models without and with frailty)

          Comment


          • #6
            Originally posted by Stephen Jenkins View Post
            Scott: you appear to have discrete time survival time data, in which case you might be interested in the resources at http://Survival Analysis Using Stata...vival-analysis, especially the Lessons (with example Stata code for models without and with frailty)
            Your link wasn't showing properly:

            https://www.iser.essex.ac.uk/files/t...s/ec968st6.pdf

            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Thanks -- I don't know what happened there. Weiwen Ng links to the Lesson for discrete time models (without frailty). The permanent URL to home page for the course as a whole, with overview of materials, is: http://www.iser.essex.ac.uk/survival-analysis

              Comment

              Working...
              X