Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • calculate IRR with confidence intervals from summarized data

    I have a dataset with 25 outcomes reported for 5 time periods and 7 cohorts. Most of the outcomes are rare and the cohorts are large (between 20,000 and 800,000 people). I'd like to be able to calculate incidence rate ratios comparing outcome rates across cohorts and/or time periods, then extract the incidence rate for each of the two populations being compared, along with the incidence rate ratio, IRR confidence interval, and p-value for the IRR=0

    The difficulty I'm having is that I don't have individual-level data, so I can't figure out how to use the Stata epitab command -ir-, which requires a "case" variable for whether the outcome occurred and an "exposed" variable for the cohort (i.e., it assumes individual-level data).
    Another approach would be to use the immediate version of the -ir- command (-iri-; which uses summarized data), but I'm not sure how to extract the values from the variables for outcome count and person-time to feed into the iri command.

    This is an extract of a modified version of my dataset with 2 time periods, two cohorts, and 2 outcomes.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str3(followupperiod cohort) str21 outcome long(events n)
    "d3" "c1" "outcome1"   59  40000
    "d3" "c1" "outcome2"   72  40000
    "d3" "c2" "outcome1"  217 800000
    "d3" "c2" "outcome2" 1515 800000
    "d7" "c1" "outcome1"   86  40000
    "d7" "c1" "outcome2"  144  40000
    "d7" "c2" "outcome1"  358 800000
    "d7" "c2" "outcome2" 2580 800000
    end
    "Events" is the number of times outcome _x_ happened (i.e., the number of cases). "n" is the number of people in the cohort for that follow-up period, cohort, and outcome.
    "followupperiod" is the number of days of follow-up: d3=3 days follow-up; d7=7 days follow-up

    I'd like to be able to calculate the IRR for the same outcome and time period for two different cohorts, or for the same outcome and cohort at two different time periods, or for the same cohort and time period for two different outcomes.

    As an example, I'd like to be able to calculate the IRR for outcome1 at time period d3 comparing cohort c1 to cohort c2. The -iri-command would be
    Code:
    iri 59 217 40000 800000
    The values are:
    events where cohort="c1" followupperiod="d3" and outcome="outcome1"
    events where cohort="c2" followupperiod="d3" and outcome="outcome1"
    n where cohort="c1" followupperiod="d3" and outcome="outcome1"
    n where cohort="c2" followupperiod="d3" and outcome="outcome1"

  • #2
    I could write you some code to do this. Mostly it would involve appropriate uses of the -collapse- command to aggregate the data up to cohort level. But maybe you should rethink this plan. You have 25 outcomes and 7 cohorts. To compare each cohort with each of the others pairwise, is 7*6/2 = 21 irr calculations for each outcome, thus a grand total of 525 incidence rate ratios. A similar, if slightly smaller number, applies for the analyses by time period. What on earth will you do with all that output? How will you make any sense of it? The point of data analysis is to reduce the volume of information into understandable, useful summaries.

    Comment


    • #3
      Thank you for the reply. I'm not planning to do all of them. I just want to be able to do arbitrary comparisons while I write up.
      Edited to add--the point of noting the number of possible comparisons was that the collapse approach and the expansion approach (i.e., creating an individual-level dataset) seemed unworkable because of the number of people in the cohort and the number of different ways I'd potentially be collapsing.
      The ideal solution would be a way of running -iri- by referring to the contents of variables, but I don't know if that's possible.
      Last edited by Molly Jeffery; 14 Jun 2022, 08:18.

      Comment


      • #4
        Well, there is a conceptual difficulty in using -ir- or -iri- here. Your data example shows in each observation a number of entities attaining the outcome and the total number who were at risk. It is easy enough to calculate the incidence rate here. You just wouldn't do it with -iri-. You would use -cii- for that:
        Code:
        //  AGGREGATE BY COHORT
        collapse (sum) events n, by(cohort outcome)
        
        forvalues i = 1/`=_N' {
            display as text _newline(3) `"`=cohort[`i']'"', `"`=outcome[`i']'"'
            cii proportions `=n[`i']' `=events[`i']'
        }
        That will get you the 7 * 25 = 375 incidence rates for each of the 25 outcomes in each of the 7 cohorts (with the follow-up periods aggregated together). You can clearly do the analogous thing to do it for follow-up periods with the cohorts aggregated together with a minor markup of the code.

        For incidence rate ratios between pairs of these results, I would recommend that you first narrow down to a reasonable number of such pairings. then you would need to create a new data set that contained those pairings and feed them in a similar way to -iri-. For example, suppose we want an incidence rate ratio for cohort 1 vs cohort 2, aggregated over all follow-up periods, for outcomes 1 and 2.

        Code:
        //  RESTRICT THE DATA TO THE DESIRED COHORTS AND OUTCOMES
        keep if inlist(cohort,"c1", "c2") & inlist(outcome, "outcome1", "outcome2")
        
        //  AGGREGATE ACROSS FOLLOW UP PERIODS
        collapse (sum) events n, by(cohort outcome)
        
        //  CREATE THE PAIRS BY RESHAPING WIDE
        reshape wide events n, i(outcome) j(cohort) string
        
        //  CALCULATE INCIDENCE RATE RATIOS
        forvalues i = 1/`=_N' {
            display _newline(3) `"`=outcome[`i']'"'
            iri `=eventsc1[`i']' `=eventsc2[`i']' `=nc1[`i']' `=nc2[`i']'
        }
        Note: This code requires that the actual names of the cohorts in your data set be admissible as pieces of Stata variable names, because they become suffixes to event and n in the -reshape- command. That means they must consist only of letters, digits, and underscore characters, with non embedded spaces or other special characters. They also have to be at most 26 characters long so that when postfixed to -events- the result is at most 32 characters in length. If they do not meet these requirements, you can modify them using the -strtoname()- function and, if you need to shorten them, -substr()-.

        Comment


        • #5
          Thank you--this is incredibly helpful!

          Comment

          Working...
          X