Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • survival analysis discrepancy between km curve and data

    Hello, i am analysing cancer survival data. About 1 third of patients eventually die yet my kaplan meier curve shows 100% dying and i cannot figure out why. i performed an stset with inclusion date (dateincl) date of last news, date of death, and id(but there is 1 line per person).
    here is the printout
    stset lastnews, origin(dateinc) fail(dateofdeath) scale(365.25) id(N)

    id: N
    failure event: dateofdeath != 0 & dateofdeath < .
    obs. time interval: (lastnews[_n-1], lastnews]
    exit on or before: failure
    t for analysis: (time-origin)/365.25
    origin: time dateincl

    ------------------------------------------------------------------------------
    639 total observations
    7 observations end on or before enter()
    ------------------------------------------------------------------------------
    632 observations remaining, representing
    632 subjects
    238 failures in single-failure-per-subject data
    2,522.943 total analysis time at risk and under observation
    at risk from t = 0
    earliest observed entry t = 0
    last observed exit t = 19.9781


    yet the sts graph drops all the way to zero
    (i tried adding exit (dateoflastnews) it is the same)

    where did i go wrong?
    thanks

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 id float(dateincl lastnews dateofdeath death)
    "20140001" 19758 20528     . .
    "20050267" 16629 20713 20713 1
    "20090043" 18133 21304 21304 1
    "20090046" 18088 19194     . .
    "20110165" 18847 20194     . .
    "20130011" 19477 20071     . .
    "20120171" 19178 19470     . .
    "20150013" 20107 21144     . .
    "20140513" 19730 21208     . .
    "20150227" 20354 21075     . .
    end
    format %td dateincl

  • #2
    You have done nothing wrong,* and the graph is a correct representation of the Kaplan-Meier estimator of the population survival function. Remember that an underlying assumption for using the K-M estimator is that censorship is independent of death. That is, it assumes that the survival time distribution of the censored observations, if it were known, would look just like the observed survival times.

    Now, take a look at the results of -sts list-:
    Code:
                 At           Net    Survivor      Std.
      Time     risk   Fail   lost    function     error     [95% conf. int.]
    ------------------------------------------------------------------------
     .7995       10      0      1      1.0000         .          .         .
     1.626        9      0      1      1.0000         .          .         .
     1.974        8      0      1      1.0000         .          .         .
     2.108        7      0      1      1.0000         .          .         .
     2.839        6      0      1      1.0000         .          .         .
     3.028        5      0      1      1.0000         .          .         .
     3.688        4      0      1      1.0000         .          .         .
     4.047        3      0      1      1.0000         .          .         .
     8.682        2      1      0      0.5000    0.3536     0.0060    0.9104
     11.18        1      1      0      0.0000         .          .         .
    ------------------------------------------------------------------------
    Note that as we approach the chronologically last event in the data set after time 8.682 we have only one surviving person. Everybody else has either died or been censored up to that point, and only one person remains at risk. At time 11.18, that 1 person out of the 1 at risk dies. In other words, there is a 100% hazard of death at time 11.18. So the survival function estimator falls to zero. Yes, only a small fraction of the observations are deaths, but the assumption underlying this methodology is that those censored observations are actually dying at the same rate as the people we observe--we just don't have the information about them. So based on this assumption, we expect that everybody will have died by 11.18 years.

    *That is not quite true. Your -stset- command has -id(N)-, but there is no N variable in the data set. I assume you either didn't set -id()- at all--after all, you have only one observation per person so you don't need the -id()- option--or you set it as -id(id)-.

    Added: The K-M estimator is widely used, and even analyses that don't directly use K-M calculations usually also rely on the assumption that censorship is independent of actual death. Survival analysis was originally developed in engineering and was used to study time to failure of parts in mechanical or electric devices. And in those studies, censorship most often arose as a result of the study period reaching its planned endpoint--which is clearly independent of failure.

    I have always been skeptical of this assumption in studies of patients with fatal illnesses. Most of these patients do not just abruptly die one day. More typically, their functional status deteriorates in their final months, and as this happens, they may withdraw from many of their usual activities, including withdrawing from participation in research studies and even from receiving further medical care. Or they may relocate out of the area where you can observe them to reside with a caregiver or in a hospice or nursing home. So I think that, realistically, in studies like this, censorship occurs preferentially among those whose death is imminent. Consequently, if anything, I think the K-M estimator usually underestimates mortality when used in this setting.
    Last edited by Clyde Schechter; 30 Aug 2024, 09:37.

    Comment


    • #3
      Thanks a lot for this very detailed answer. It is very helpful!

      Comment

      Working...
      X