How to estimate mortality and CIs from survival data (KM curves)?

Sean Hardiman

Join Date: Sep 2016

Posts: 53
#1

How to estimate mortality and CIs from survival data (KM curves)?

14 Aug 2022, 21:14

Hi everyone,

I'm doing survival analysis in Stata 17. I'd like to estimate the rate of mortality and confidence intervals at four time points during my study - 30 days, 1 year, 2 years, and 3 years. I have -stset- the data and can generate KM curves that are both unadjusted and adjusted with inverse probability weights. I can also generate a risk table along with the figure that records. -stpm2- is working fine for estimating risk ratios at the various time intervals I need. However, I'm not sure the best way to extract (if that's the word) the unadjusted and adjusted mortality at my time points of interest. I've been able to use the -ltable- command to get unadjusted mortality, but I'm not sure if this is correct. Regardless, -ltable- doesn't appear to work with probability weighted data. I can generate a risk table with -sts graph-. However, that still seems to require me to do a bit of manual work that seems a bit prone to error. I've also been able to output a .dta set that just has the KM data in it, but I'm not sure what to do with this next now that I have it.

I would welcome advice on how best to take the KM curves I have plotted and estimate mortality and 95% CIs at various time points. Happy to post any code that is relevant - just not sure what might be needed yet!

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

14 Aug 2022, 22:01

Have you looked at -sts list-? I haven't used it in a while, but I think it handles everything you are seeking here.
1 like
Comment
Sean Hardiman

Join Date: Sep 2016

Posts: 53
#3

14 Aug 2022, 23:05

Oh my goodness, I'm not sure how I missed this, but I did. I will check this out and report back with what I found! Thanks Clyde!
Comment
Sean Hardiman

Join Date: Sep 2016

Posts: 53
#4

05 Sep 2022, 14:08

Hi Clyde, everyone,

-sts list- was able to give me the mortality estimates I needed, along with confidence intervals. However, despite being able to -stset- the data using -pweight-, there is seemingly no function to be able to estimate confidence intervals when the data is weighted using -pweight-. This is documented at https://www.stata.com/manuals/ststslist.pdf on p4 in 'Remarks and examples’. According to Stata Support, who was very helpful, the equations in -sts- that would generate a confidence interval aren't appropriate for -pweight-ed data (p386 of the -st- manual)

Does anyone have any advice for how I could estimate confidence intervals using sampling weights through -sts-? The study whose methods I have adapted for my work reported that they used bootstrapping to estimate confidence intervals, but didn't say much about how they did it.

Suggestions welcomed!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

05 Sep 2022, 18:08

Yes, -bootstrap- is one way to do that. It will take some work. You will need to wrap your -sts list- command in a program that saves the results in -r()-. Since -sts list- itself leaves nothing behind in -r()-, you will have to pull those results by saving them to a temporary file and then extracting them. You don't show the exact -sts list- command that gave you what you wanted. To illustrate the general approach, I will assume the command was -sts list, risktable(30 365 730 1095)-. Then you will have to do something like this:

Code:

capture program drop get_survival program define get_survival, rclass tempfile life_table sts list, risktable(30 365 730 1095) saving(`life_table') preserve use `life_table', clear return scalar S30 = survivor[1] return scalar S365 = survivor[2] return scalar S730 = survivor[3] return scalar S1095 = survivor[4] restore exit end // DON'T FORGET TO -STSET- YOUR DATA. // AND BE SURE TO SET THE RANDOM NUMBER SEED TO GET REPRODUCIBLE RESULTS bootstrap S30 = r(S30) S365 = r(365) S730 = r(S730) S1095 = r(S1095), reps(1000): get_survival estat bootstrap

I am not an expert in bootstrapping, though I have some experience with it. I think that for your purposes, the bias-corrected confidence intervals you get will be the ones to work with.

Evidently this code is untested, and it may contain errors. But it should point you in the right direction.
1 like
Comment
Sean Hardiman

Join Date: Sep 2016

Posts: 53
#6

07 Sep 2022, 22:34

Thank you Clyde, this is fantastic. I'll take a run at this and update with my results!
Comment
Sean Hardiman

Join Date: Sep 2016

Posts: 53
#7

10 Sep 2022, 01:51

Hi Clyde, everyone,

This worked extremely well. Modified code to match to my risk table construction and got CIs that made sense for my data. One question - it took what felt like a long time to run the 1000 simulations (~25 minutes). My computer isn't the speediest (3.8 GHz quad core Intel i5) so that's probably part of it, yet was curious if this was normal. I'm guessing it's a function of the number of simulations? Maybe it's time for an upgrade!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#8

10 Sep 2022, 08:13

I'm not surprised at the time required. The calculations are quick enough, but the code thrashes the hard drive with a file creation, file write, file read, and file erase at every single iteration. I/O is slow! For some reason, probably because it is a pretty old command, -sts list- leaves no results behind in -r()-, so we have to rely on the -saving()- option to gather the statistics you need.

In any case, your CPU isn't the bottleneck here. If you're going to upgrade anything to get better speed for this kind of thing, think about getting a faster hard drive. Another thing to consider, that won't cost you a dime, is how full and fragmented your hard drive is. Cleaning that up might help.
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2402

10 Sep 2022, 18:35

I would offer another approach which is computationally faster and yields similar results. The time needed for the bootstrap scales with K repetitions and N observations, while the pseudovalue approach scales with M non-censoring events. It is also faster in a practical sense because the work is done with Mata behind the scenes, and no I/O is needed.

The method is based on pseudovalues and I believe that it should perform as well or better than a bootstrap. It is described in the following references and you can use these as starting points to later extensions of the method.

1. Klein JP, Andersen PK, Logan BL, Harhoff MG. Analyzing survival curves at a fixed point in time. Statistics in Medicine. 2007

2. Klein JP, Gerster M, Andersen PK, Tarima S and Perme MP. SAS and R Functions to Compute Pseudo-values for Censored Data Regression. Comput Methods Programs Biomed. 2008

Here is an example using an available dataset to show the technique, and slightly modified code from Clyde to compare to the bootstrap.

Code:

clear *
cls

// DON'T FORGET TO -STSET- YOUR DATA.
// AND BE SURE TO SET THE RANDOM NUMBER SEED TO GET REPRODUCIBLE RESULTS
set seed 17
webuse drugtr
stset studytime, failure(died)
sts list, risktable(5 15 30)

stpsurv, at(5 15 30) gen(psurv)
gen `c(obs_t)' id = _n
xtset id
forval i = 1/3 {
  *glm psurv`i' , fam(bin) link(logit) vce(rob) nolog
  xtgee psurv`i', fam(poisson) link(log) corr(ind) vce(rob) nolog
  lincom _cons, eform
}

capture program drop get_survival
program define get_survival, rclass
    tempfile life_table
    sts list, risktable(5 15 30) saving(`life_table')
    preserve
    use `life_table', clear
    return scalar S5 = survivor[1]
    return scalar S15 = survivor[2]
    return scalar S30 = survivor[3]
    restore
    exit
end

bootstrap S5=r(S5) S15=r(S15) S30=r(S30), reps(1000): get_survival
estat bootstrap

Condensed results:

Code:

Pseudo-values with GEE approach, Poisson family,
log link, independent correlation struction, and
robust variance estimation.
                Robust
     Estimate  Std. Err.      95%           CI
 S5  .8333333   .0543607    .7333182    .9469892
S10   .538306   .0759427    .4082668    .7097647
S15  .2556954   .0776754    .1409756     .463769


Bootsrap method:
     Estimate  Std. Err.      95% Bootstrap CI
 S5 .83333331  .05397074    .7291667      .9375  (BC)
S15   .538306  .07419951    .3887963   .6759515  (BC)
S30 .25569537  .07585423    .1265833   .4231777  (BC)

Comment

Sean Hardiman

Join Date: Sep 2016

Posts: 53
#10

12 Sep 2022, 22:40

Hi Clyde, Leonardo,

Thanks for your considered comments. Clyde, it's good to know this isn't a CPU-bounded problem. My iMac is getting a bit long in the tooth and has an old fusion drive (part SSD, part HDD). It probably doesn't help much if it's moving stuff back and forth from SSD to HDD. It does have a few TB of free space open, so at least I've got that going for me. Next computer won't have an HDD, that much is for sure!

Leonardo, thanks for your suggestion of an alternative method and the modified code. I'm going to try implementing this and see how I do! Thanks again to both of you!
Comment
Natalie Malek

Join Date: Jul 2019

Posts: 18
#11

19 Sep 2022, 19:12

What if you need CIs at every point there is an event (i.e. not just 3 or 4 time points)?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

19 Sep 2022, 19:28

It wouldn't be much different from #9. You would first have to identify all of the time points at which events occur. Then you would modify the program being bootstrapped and the bootstrap command itself to, respectively, -return scalar- the survival function at each of those times, and pick them up for the -bootstrap- calculations.
1 like
Comment

Sean Hardiman

Join Date: Sep 2016
Posts: 53

#13

20 Sep 2022, 00:30

Hi everyone,

Here are my results!

Code:

Bootstrap results                                       Number of obs = 47,211
                                                        Replications  =  1,000

      Command: Get_Survival
           S1: r(S1)
          S12: r(S12)
          S24: r(S24)
          S36: r(S36)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          S1 |   .9806789   .0037382   262.34   0.000     .9733522    .9880055
         S12 |   .9567834   .0043384   220.54   0.000     .9482804    .9652865
         S24 |   .9361932   .0046183   202.72   0.000     .9271416    .9452448
         S36 |   .9126749   .0050748   179.85   0.000     .9027285    .9226213
------------------------------------------------------------------------------

However, what I am noticing is that I get CI's for overall mortality but not for . How can I integrate that into the program code that I've used?

Code:

. capture program drop Get_Survival                                                               // Clears the program named 'get_survival'
r; t=0.03 22:31:46

. program define Get_Survival, rclass                                                     // Establish the program as one that stores results
  1.     tempfile Life_Table                                                                                 // Set -Life_Table- as the temporary 
  2.     sts list, risktable(1 12 24 36) saving(`Life_Table')                // Run sts list to establish the risk table for 1m (30d), 12m (1y), 24m (2y), and 36m (3y) mortality
  3.     preserve                                                                                                    // Preserve the data
  4.     use `Life_Table', clear                                                                             // Use -Life_Table-, and clear any held values.
  5.     return scalar S1 = survivor[1]                                                              // Return the scalar for 1m mortality as S1 for the survivor data in the first element of the risk table.
  6.     return scalar S12 = survivor[2]                                                             // Return the scalar for 12m mortality as S12 for the survivor data in the second element of the risk table.
  7.     return scalar S24 = survivor[3]                                                             // Return the scalar for 24m mortality as S24 for the survivor data in the third element of the risk table.
  8.     return scalar S36 = survivor[4]                                                             // Return the scalar for 36m mortality as S36 for the survivor data in the fourth element of the risk table.
  9.     restore                                                                                                             // Restore the data set
 10.     exit                                                                                                                // Terminates the program

I've also tried adding -by (study_group)- to the -sts list- line but that didn't seem to do anything. I'm thinking I need to revise the scalar lines? Advice welcomed!

Comment

Sean Hardiman

Join Date: Sep 2016

Posts: 53
#14

20 Sep 2022, 09:17

Oops, looks like in my edit I forgot to finish the sentence with the most important detail - looking for CIs for my study groups!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#15

20 Sep 2022, 09:55

This is your first mention of study groups in the entire thread. I think the simplest way to get this by study group is to simply run the code separately for each one. You can write a loop over the levels of study group and embed all of the code inside that loop, adding -if- conditions to restrict the calculations to the currently iterated group.
Comment

Announcement