Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to estimate mortality and CIs from survival data (KM curves)?

    Hi everyone,

    I'm doing survival analysis in Stata 17. I'd like to estimate the rate of mortality and confidence intervals at four time points during my study - 30 days, 1 year, 2 years, and 3 years. I have -stset- the data and can generate KM curves that are both unadjusted and adjusted with inverse probability weights. I can also generate a risk table along with the figure that records. -stpm2- is working fine for estimating risk ratios at the various time intervals I need. However, I'm not sure the best way to extract (if that's the word) the unadjusted and adjusted mortality at my time points of interest. I've been able to use the -ltable- command to get unadjusted mortality, but I'm not sure if this is correct. Regardless, -ltable- doesn't appear to work with probability weighted data. I can generate a risk table with -sts graph-. However, that still seems to require me to do a bit of manual work that seems a bit prone to error. I've also been able to output a .dta set that just has the KM data in it, but I'm not sure what to do with this next now that I have it.

    I would welcome advice on how best to take the KM curves I have plotted and estimate mortality and 95% CIs at various time points. Happy to post any code that is relevant - just not sure what might be needed yet!

    Thanks!

  • #2
    Have you looked at -sts list-? I haven't used it in a while, but I think it handles everything you are seeking here.

    Comment


    • #3
      Oh my goodness, I'm not sure how I missed this, but I did. I will check this out and report back with what I found! Thanks Clyde!

      Comment


      • #4
        Hi Clyde, everyone,

        -sts list- was able to give me the mortality estimates I needed, along with confidence intervals. However, despite being able to -stset- the data using -pweight-, there is seemingly no function to be able to estimate confidence intervals when the data is weighted using -pweight-. This is documented at https://www.stata.com/manuals/ststslist.pdf on p4 in 'Remarks and examples’. According to Stata Support, who was very helpful, the equations in -sts- that would generate a confidence interval aren't appropriate for -pweight-ed data (p386 of the -st- manual)

        Does anyone have any advice for how I could estimate confidence intervals using sampling weights through -sts-? The study whose methods I have adapted for my work reported that they used bootstrapping to estimate confidence intervals, but didn't say much about how they did it.

        Suggestions welcomed!

        Comment


        • #5
          Yes, -bootstrap- is one way to do that. It will take some work. You will need to wrap your -sts list- command in a program that saves the results in -r()-. Since -sts list- itself leaves nothing behind in -r()-, you will have to pull those results by saving them to a temporary file and then extracting them. You don't show the exact -sts list- command that gave you what you wanted. To illustrate the general approach, I will assume the command was -sts list, risktable(30 365 730 1095)-. Then you will have to do something like this:

          Code:
          capture program drop get_survival
          program define get_survival, rclass
              tempfile life_table
              sts list, risktable(30 365 730 1095) saving(`life_table')
              preserve
              use `life_table', clear
              return scalar S30 = survivor[1]
              return scalar S365 = survivor[2]
              return scalar S730 = survivor[3]
              return scalar S1095 = survivor[4]
              restore
              exit
          end
          
          // DON'T FORGET TO -STSET- YOUR DATA.
          // AND BE SURE TO SET THE RANDOM NUMBER SEED TO GET REPRODUCIBLE RESULTS
          bootstrap S30 = r(S30) S365 = r(365) S730 = r(S730) S1095 = r(S1095), reps(1000): get_survival
          estat bootstrap
          I am not an expert in bootstrapping, though I have some experience with it. I think that for your purposes, the bias-corrected confidence intervals you get will be the ones to work with.

          Evidently this code is untested, and it may contain errors. But it should point you in the right direction.

          Comment


          • #6
            Thank you Clyde, this is fantastic. I'll take a run at this and update with my results!

            Comment


            • #7
              Hi Clyde, everyone,

              This worked extremely well. Modified code to match to my risk table construction and got CIs that made sense for my data. One question - it took what felt like a long time to run the 1000 simulations (~25 minutes). My computer isn't the speediest (3.8 GHz quad core Intel i5) so that's probably part of it, yet was curious if this was normal. I'm guessing it's a function of the number of simulations? Maybe it's time for an upgrade!

              Comment


              • #8
                I'm not surprised at the time required. The calculations are quick enough, but the code thrashes the hard drive with a file creation, file write, file read, and file erase at every single iteration. I/O is slow! For some reason, probably because it is a pretty old command, -sts list- leaves no results behind in -r()-, so we have to rely on the -saving()- option to gather the statistics you need.

                In any case, your CPU isn't the bottleneck here. If you're going to upgrade anything to get better speed for this kind of thing, think about getting a faster hard drive. Another thing to consider, that won't cost you a dime, is how full and fragmented your hard drive is. Cleaning that up might help.

                Comment


                • #9
                  I would offer another approach which is computationally faster and yields similar results. The time needed for the bootstrap scales with K repetitions and N observations, while the pseudovalue approach scales with M non-censoring events. It is also faster in a practical sense because the work is done with Mata behind the scenes, and no I/O is needed.

                  The method is based on pseudovalues and I believe that it should perform as well or better than a bootstrap. It is described in the following references and you can use these as starting points to later extensions of the method.

                  1. Klein JP, Andersen PK, Logan BL, Harhoff MG. Analyzing survival curves at a fixed point in time. Statistics in Medicine. 2007

                  2. Klein JP, Gerster M, Andersen PK, Tarima S and Perme MP. SAS and R Functions to Compute Pseudo-values for Censored Data Regression. Comput Methods Programs Biomed. 2008

                  Here is an example using an available dataset to show the technique, and slightly modified code from Clyde to compare to the bootstrap.

                  Code:
                  clear *
                  cls
                  
                  // DON'T FORGET TO -STSET- YOUR DATA.
                  // AND BE SURE TO SET THE RANDOM NUMBER SEED TO GET REPRODUCIBLE RESULTS
                  set seed 17
                  webuse drugtr
                  stset studytime, failure(died)
                  sts list, risktable(5 15 30)
                  
                  stpsurv, at(5 15 30) gen(psurv)
                  gen `c(obs_t)' id = _n
                  xtset id
                  forval i = 1/3 {
                    *glm psurv`i' , fam(bin) link(logit) vce(rob) nolog
                    xtgee psurv`i', fam(poisson) link(log) corr(ind) vce(rob) nolog
                    lincom _cons, eform
                  }
                  
                  capture program drop get_survival
                  program define get_survival, rclass
                      tempfile life_table
                      sts list, risktable(5 15 30) saving(`life_table')
                      preserve
                      use `life_table', clear
                      return scalar S5 = survivor[1]
                      return scalar S15 = survivor[2]
                      return scalar S30 = survivor[3]
                      restore
                      exit
                  end
                  
                  bootstrap S5=r(S5) S15=r(S15) S30=r(S30), reps(1000): get_survival
                  estat bootstrap
                  Condensed results:

                  Code:
                  Pseudo-values with GEE approach, Poisson family,
                  log link, independent correlation struction, and
                  robust variance estimation.
                                  Robust
                       Estimate  Std. Err.      95%           CI
                   S5  .8333333   .0543607    .7333182    .9469892
                  S10   .538306   .0759427    .4082668    .7097647
                  S15  .2556954   .0776754    .1409756     .463769
                  
                  
                  Bootsrap method:
                       Estimate  Std. Err.      95% Bootstrap CI
                   S5 .83333331  .05397074    .7291667      .9375  (BC)
                  S15   .538306  .07419951    .3887963   .6759515  (BC)
                  S30 .25569537  .07585423    .1265833   .4231777  (BC)

                  Comment


                  • #10
                    Hi Clyde, Leonardo,

                    Thanks for your considered comments. Clyde, it's good to know this isn't a CPU-bounded problem. My iMac is getting a bit long in the tooth and has an old fusion drive (part SSD, part HDD). It probably doesn't help much if it's moving stuff back and forth from SSD to HDD. It does have a few TB of free space open, so at least I've got that going for me. Next computer won't have an HDD, that much is for sure!

                    Leonardo, thanks for your suggestion of an alternative method and the modified code. I'm going to try implementing this and see how I do! Thanks again to both of you!

                    Comment


                    • #11
                      What if you need CIs at every point there is an event (i.e. not just 3 or 4 time points)?

                      Comment


                      • #12
                        It wouldn't be much different from #9. You would first have to identify all of the time points at which events occur. Then you would modify the program being bootstrapped and the bootstrap command itself to, respectively, -return scalar- the survival function at each of those times, and pick them up for the -bootstrap- calculations.

                        Comment


                        • #13
                          Hi everyone,

                          Here are my results!

                          Code:
                          Bootstrap results                                       Number of obs = 47,211
                                                                                  Replications  =  1,000
                          
                                Command: Get_Survival
                                     S1: r(S1)
                                    S12: r(S12)
                                    S24: r(S24)
                                    S36: r(S36)
                          
                          ------------------------------------------------------------------------------
                                       |   Observed   Bootstrap                         Normal-based
                                       | coefficient  std. err.      z    P>|z|     [95% conf. interval]
                          -------------+----------------------------------------------------------------
                                    S1 |   .9806789   .0037382   262.34   0.000     .9733522    .9880055
                                   S12 |   .9567834   .0043384   220.54   0.000     .9482804    .9652865
                                   S24 |   .9361932   .0046183   202.72   0.000     .9271416    .9452448
                                   S36 |   .9126749   .0050748   179.85   0.000     .9027285    .9226213
                          ------------------------------------------------------------------------------
                          However, what I am noticing is that I get CI's for overall mortality but not for . How can I integrate that into the program code that I've used?

                          Code:
                          . capture program drop Get_Survival                                                               // Clears the program named 'get_survival'
                          r; t=0.03 22:31:46
                          
                          . program define Get_Survival, rclass                                                     // Establish the program as one that stores results
                            1.     tempfile Life_Table                                                                                 // Set -Life_Table- as the temporary 
                            2.     sts list, risktable(1 12 24 36) saving(`Life_Table')                // Run sts list to establish the risk table for 1m (30d), 12m (1y), 24m (2y), and 36m (3y) mortality
                            3.     preserve                                                                                                    // Preserve the data
                            4.     use `Life_Table', clear                                                                             // Use -Life_Table-, and clear any held values.
                            5.     return scalar S1 = survivor[1]                                                              // Return the scalar for 1m mortality as S1 for the survivor data in the first element of the risk table.
                            6.     return scalar S12 = survivor[2]                                                             // Return the scalar for 12m mortality as S12 for the survivor data in the second element of the risk table.
                            7.     return scalar S24 = survivor[3]                                                             // Return the scalar for 24m mortality as S24 for the survivor data in the third element of the risk table.
                            8.     return scalar S36 = survivor[4]                                                             // Return the scalar for 36m mortality as S36 for the survivor data in the fourth element of the risk table.
                            9.     restore                                                                                                             // Restore the data set
                           10.     exit                                                                                                                // Terminates the program
                          I've also tried adding -by (study_group)- to the -sts list- line but that didn't seem to do anything. I'm thinking I need to revise the scalar lines? Advice welcomed!

                          Comment


                          • #14
                            Oops, looks like in my edit I forgot to finish the sentence with the most important detail - looking for CIs for my study groups!

                            Comment


                            • #15
                              This is your first mention of study groups in the entire thread. I think the simplest way to get this by study group is to simply run the code separately for each one. You can write a loop over the levels of study group and embed all of the code inside that loop, adding -if- conditions to restrict the calculations to the currently iterated group.

                              Comment

                              Working...
                              X