survival analysis discrepancy between km curve and data

mathieu nacher

Join Date: Jan 2019

Posts: 41
#1

survival analysis discrepancy between km curve and data

30 Aug 2024, 06:56

Hello, i am analysing cancer survival data. About 1 third of patients eventually die yet my kaplan meier curve shows 100% dying and i cannot figure out why. i performed an stset with inclusion date (dateincl) date of last news, date of death, and id(but there is 1 line per person).
here is the printout
stset lastnews, origin(dateinc) fail(dateofdeath) scale(365.25) id(N)

id: N
failure event: dateofdeath != 0 & dateofdeath < .
obs. time interval: (lastnews[_n-1], lastnews]
exit on or before: failure
t for analysis: (time-origin)/365.25
origin: time dateincl

------------------------------------------------------------------------------
639 total observations
7 observations end on or before enter()
------------------------------------------------------------------------------
632 observations remaining, representing
632 subjects
238 failures in single-failure-per-subject data
2,522.943 total analysis time at risk and under observation
at risk from t = 0
earliest observed entry t = 0
last observed exit t = 19.9781

yet the sts graph drops all the way to zero
(i tried adding exit (dateoflastnews) it is the same)

where did i go wrong?
thanks

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str8 id float(dateincl lastnews dateofdeath death) "20140001" 19758 20528 . . "20050267" 16629 20713 20713 1 "20090043" 18133 21304 21304 1 "20090046" 18088 19194 . . "20110165" 18847 20194 . . "20130011" 19477 20071 . . "20120171" 19178 19470 . . "20150013" 20107 21144 . . "20140513" 19730 21208 . . "20150227" 20354 21075 . . end format %td dateincl
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30066
#2

30 Aug 2024, 09:27

You have done nothing wrong,* and the graph is a correct representation of the Kaplan-Meier estimator of the population survival function. Remember that an underlying assumption for using the K-M estimator is that censorship is independent of death. That is, it assumes that the survival time distribution of the censored observations, if it were known, would look just like the observed survival times.

Now, take a look at the results of -sts list-:

Code:

At Net Survivor Std. Time risk Fail lost function error [95% conf. int.] ------------------------------------------------------------------------ .7995 10 0 1 1.0000 . . . 1.626 9 0 1 1.0000 . . . 1.974 8 0 1 1.0000 . . . 2.108 7 0 1 1.0000 . . . 2.839 6 0 1 1.0000 . . . 3.028 5 0 1 1.0000 . . . 3.688 4 0 1 1.0000 . . . 4.047 3 0 1 1.0000 . . . 8.682 2 1 0 0.5000 0.3536 0.0060 0.9104 11.18 1 1 0 0.0000 . . . ------------------------------------------------------------------------

Note that as we approach the chronologically last event in the data set after time 8.682 we have only one surviving person. Everybody else has either died or been censored up to that point, and only one person remains at risk. At time 11.18, that 1 person out of the 1 at risk dies. In other words, there is a 100% hazard of death at time 11.18. So the survival function estimator falls to zero. Yes, only a small fraction of the observations are deaths, but the assumption underlying this methodology is that those censored observations are actually dying at the same rate as the people we observe--we just don't have the information about them. So based on this assumption, we expect that everybody will have died by 11.18 years.

*That is not quite true. Your -stset- command has -id(N)-, but there is no N variable in the data set. I assume you either didn't set -id()- at all--after all, you have only one observation per person so you don't need the -id()- option--or you set it as -id(id)-.

Added: The K-M estimator is widely used, and even analyses that don't directly use K-M calculations usually also rely on the assumption that censorship is independent of actual death. Survival analysis was originally developed in engineering and was used to study time to failure of parts in mechanical or electric devices. And in those studies, censorship most often arose as a result of the study period reaching its planned endpoint--which is clearly independent of failure.

I have always been skeptical of this assumption in studies of patients with fatal illnesses. Most of these patients do not just abruptly die one day. More typically, their functional status deteriorates in their final months, and as this happens, they may withdraw from many of their usual activities, including withdrawing from participation in research studies and even from receiving further medical care. Or they may relocate out of the area where you can observe them to reside with a caregiver or in a hospice or nursing home. So I think that, realistically, in studies like this, censorship occurs preferentially among those whose death is imminent. Consequently, if anything, I think the K-M estimator usually underestimates mortality when used in this setting.

Last edited by Clyde Schechter; 30 Aug 2024, 09:37.
1 like
Comment
mathieu nacher

Join Date: Jan 2019

Posts: 41
#3

30 Aug 2024, 11:08

Thanks a lot for this very detailed answer. It is very helpful!
Comment

Announcement

survival analysis discrepancy between km curve and data

Comment

Comment