Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Immortal time bias: how to match observations on multiple variables

    Dear Listers
    This is perhaps a more theoretical question than a directly stata related question, but there is a question about code in there as well.

    The problem: In a study of survival i have two cohorts (exposed and unexposed). Observation starts at the time X, and ends at the time Z. Time Z is defined as end of obs or time of death.
    At some point in time between X and Z the exposed become exposed (at time X+Ndays). I want to calculate the HR for death between the exposed and the unexposed. Exposure in this is specific treatment.


    I have, however, introduced an immortal time of Ndays to the exposed group as i condition them on a future event - you have to be alive at the time of exposure and the unexposed don't. This gives a HR above 1.00 when comparing the unexposed to the exposed.

    Well i know that the exposed are unexposed until they in fact become exposed. So i thought i might split my data on exposure, saying that the exposed are unexposed until they get exposed. That, on the other hand, only moves the immortal Ndays over to the unexposed group and i get a HR (when comparing the unexposed to the exposed) closer to 1.00 or even lower than 1.00.
    I have tried a conditional landmarking approach, saying that we start the observations time at a set point in time and define who is and who is not exposed at this time and compare these groups. This did not change much, i leave out about half of my cohort that later than 1year after start of observation become exposed - i loose power.

    So i thought, what if i just disregard the Ndays - but in both cohorts. I match my cohorts on what ever co-variates i would normally put in my Cox model (gender, age at start, charlston comorbidity for example) and dropping all of the unexposed that are not alive at the time their matched exposed counterpart gets exposed - and than run the Cox model.

    My questions are:
    1) Is this a feasible way to go about it?
    2) What about those exposed/unexposed that cannot be matched with the other cohort

    3) could I use the Ndays until exposure in the model?
    thought i might put it in as a continuous variable looking at the Hazard increase by each day in the Ndays - thus being able to say something about the effect of prolonging time to exposure.
    But the Ndays should it than be:
    or the exposed= days from start obs until the exposure date
    for the unexposed=days from start obs until end obs OR days from start obs until the matched exposed counterpart gets exposed?

    4) How would i go about matching the two cohorts and how do i figure out if the unexposed have died prior to the matched exposed counterpart gets exposed.


    I provide you with a mock dataset (the original dataset has 4500 exposed and 4200 unexposed)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id case gender cci dead start_of_obs expo_date end_of_obs age_start_obs)
     1 0 0 0 0 18518     . 19358 76
     2 1 1 2 1 17358 17601 17843 43
     3 1 0 0 1 16025 16652 16670 70
     4 1 1 1 0 19323 19316 19358 58
     5 0 0 3 0 18609     . 19358 41
     6 1 0 0 0 19268 19141 19358 31
     7 0 1 0 1 10757     . 11119 73
     8 1 1 1 0 18745 18995 19358 40
     9 0 0 1 1 11153     . 11650 34
    10 0 0 3 0 19342     . 19358 67
    11 1 1 2 0 18815 19215 19358 65
    12 0 1 3 0 19312     . 19358 80
    13 1 0 1 0 18881 19010 19358 64
    14 0 0 2 1 18407     . 19088 43
    15 0 0 1 1 11627     . 12352 68
    16 0 0 1 0 18619     . 19358 32
    17 0 1 2 1 10231     . 10920 40
    18 1 0 3 0 18975 18979 19358 29
    19 1 0 0 1 14682 15114 15478 63
    20 1 0 0 0 18899 19262 19358 37
    21 1 0 0 0 18481 19150 19358 79
    22 0 0 3 1 12897     . 13429 57
    23 0 1 3 1 11887     . 12346 27
    24 1 0 2 1 12792 13318 13568 47
    25 1 1 0 1  7435  7880  7954 20
    end
    format %td start_of_obs
    format %td expo_date
    format %td end_of_obs
    ids are unique
    case indicates if you are exposed or unexposed
    cci is the charlson comorbidity

    Hope my questions are somewhat understandable.

    Lars

  • #2
    I don't know if I understand what you are asking or not. But to the extent I do, it sounds to me like it is handled by the appropriate use of the -origin()- and -enter()- options in -stset-. -enter()- designates the time that the subject comes under observation, and -origin()- designates the time that the subject becomes at risk for the failure outcome. It sounds to me like your Ndays represents the interval between those two. Am I on the right track here?

    Comment


    • #3
      Clyde Schechter Thank you.
      I think i took a difficult problem and made it incomprehensible.

      Not sure that the -enter()- -origin()- solves my problem. I will try to make it a bit more clear.

      i hava a cohort of patients diagnosed between 1977 and 2012. and the end of observation is at the end of 2012.
      some patients will be treated for the disease and some will go through life un treated. lets just assume that the ones that are treated are equal to those that are un treated and the decision to treat is random within the cohort of ill (its probably not but lets take that discussion another time).

      Now i want to look at the effect of treatment on mortality risk.

      The problem is that the diagnosis is made at the time==X
      Treatment is started at the time==X+N
      This N can be anything from 0days to 33years.

      The exposed are defined as those started in treatment at any given time looking back from the end of 2012. So all the treated patients are immortal for N days - as they have to be alive long enough to be treated.

      Now lets look at a very small sample

      id1 is diagnosed at time = 0
      Id2 is diagnosed at time = 4

      id1 starts treatment at time = 300 days
      id2 never starts treatment

      id1 dies at time == 450 days
      id2 dies at time == 200 days

      If we compare the two directly treatment is good, but if id2 had lived longer treatment may have been started at some point - we just don't know.

      the following pair is more what i think i want.
      id3 is diagnosed at time = 0
      id4 is diagnosed at time = 3

      id3 starts treatment at time = 300 days
      id4 is never treated

      id3 dies at time = 400
      id4 dies at time = 350

      ///
      so to overcome immortal time - could (and should) i match my to cohoes (treated // never(as far as we know) treated) on age, gender, cci and exclude pairs where the untreated dies prior to the matched treated counterpart starts treatment.

      if yes, how do i do that in the example in #1

      Lars

      Comment


      • #4
        I am following this

        Comment

        Working...
        X