Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Smoothing Kaplan Meier Curves

    Hello STATA Community,

    I am a new STATA user and I am trying to compute smooth Kaplan Meier curves. Is this possible?

    After created my survival dataset with all appropriate variables, I run:
    Code:
    sts graph, by(kidneycan)
    This produces Kaplan Meier curves but unfortunately the data center I work out of will not let me release these unless they are smoothed (because every event (i.e. drop in the graph) needs to be experienced by a certain number of people for confidentiality reasons). Smoothed survival curves still show the pattern by group, but removes the specificity of when each drop occurs. Can someone help provide syntax or instruction on how to smooth these curves?

    I know how to compute smooth hazard functions but I'm really only looking to produce survival curves.

    Thank you for your help.

  • #2
    Use -sts generate- to create a new variable containing the survival function, and then you can plot that against time with -lowess-, or use whatever smoother you like on it and plot the smoothed variable against time.

    I understand in general terms the need for protecting confidentiality, but I have never heard of anybody going to this extreme about it. I mean it's usually impossible to read off precise values of the variables from a graph in any case. But, whatever.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      Use -sts generate- to create a new variable containing the survival function, and then you can plot that against time with -lowess-, or use whatever smoother you like on it and plot the smoothed variable against time.

      I understand in general terms the need for protecting confidentiality, but I have never heard of anybody going to this extreme about it. I mean it's usually impossible to read off precise values of the variables from a graph in any case. But, whatever.
      Hello Dr Schechter. I am facing the same challenge: smoothing the KM curves due to confidentiality reasons. I tried the -sts generate- and -lowess- approach, but Stata was frozen after I ran the codes probably because the sample size is large (~5m).

      I am thinking if I can use -ltable - to get the life table, and then use lowess to plot the Survival against the upper bound of the time interval. But it appears that Stata doesn't store results from -ltable -.

      Any suggestion?

      Thank you very much.

      Comment


      • #4
        At the risk of stating the obvious; use a machine with more RAM.

        You could use -sts list- with the saving option and then plot the resulting data. In this example I am only saving the predicted survival of 11 selected values of time. The saved data has just 22 observations so you shouldn't run into problems with the plotting (although possibly with the estimation).

        Code:
        sts list, by(sex) at(0(1)10) saving(foo)
        You could try plotting the predictions from a flexible parametric model (ssc install stpm2). The code below plots both the Kaplan-Meier and the fpm estimates. The fpm estimates are made for just 101 equally spaced values of time between 0 and 10 years.

        Click image for larger version

Name:	foo.png
Views:	1
Size:	62.7 KB
ID:	1558577


        Code:
        // exclude unknown stage (stage==0)
        use https://pauldickman.com/data/melanoma.dta if stage>0, clear
        
        // stset (status==1 is death due to melanoma) 
        stset surv_mm, fail(status==1) scale(12) exit(time 120)
        
        // create dummy variable for sex
        generate male=(sex==1)
        
        // Kaplan-Meier estimate of survival
        sts gen km=s, by(sex)
        
        // Fit a flexible parametric model and save the predicted survival
        // Allow non-proportional hazards for sex
        stpm2 i.male, scale(h) df(5) eform tvc(male) dftvc(2)
        
        predict fpm, survival
        
        twoway     (line km _t if male==0 , sort connect(stairstep) lpattern(dash) lwidth(medthick) lcolor(red%50)) ///
                (line km _t if male==1 , sort connect(stairstep) lpattern(dash) lwidth(medthick) lcolor(blue%50)) ///
                (line fpm _t if male==0 , sort lpattern(solid) lwidth(medthick) lcolor(red)) ///
                (line fpm _t if male==1 , sort lpattern(solid) lwidth(medthick) lcolor(blue)) ///
                , scheme(sj) ysize(8) xsize(11) name("s", replace) ///
                ytitle("Cause-specific survival") xtitle("Years since diagnosis") ///
                legend(label(1 "Female K-M") label(2 "Male K-M") label(3 "Female fpm") label(4 "Male fpm") ring(0) pos(7) col(1))
                
        // predict survival at 101 equally spaced values of time between 0 and 10 years
        range temptime 0 10 101    
        predict s_male, survival at(male 1) time(temptime) 
        predict s_female, survival at(male 0) time(temptime) 
        
        twoway     (line km _t if male==0 , sort connect(stairstep) lpattern(dash) lwidth(medthick) lcolor(red%50)) ///
                (line km _t if male==1 , sort connect(stairstep) lpattern(dash) lwidth(medthick) lcolor(blue%50)) ///
                (line s_female temptime , sort lpattern(solid) lwidth(medthick) lcolor(red)) ///
                (line s_male temptime , sort lpattern(solid) lwidth(medthick) lcolor(blue)) ///
                , scheme(sj) ysize(8) xsize(11) name("s2", replace) ///
                ytitle("Cause-specific survival") xtitle("Years since diagnosis") ///
                legend(label(1 "Female K-M") label(2 "Male K-M") label(3 "Female fpm") label(4 "Male fpm") ring(0) pos(7) col(1))
        graph export foo.png, replace

        Comment


        • #5
          Very useful advice. Thank you Dr Dickman.

          Comment

          Working...
          X