Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conditional relative survival modelling

    Hello everyone,

    Currently I am involved in a project in which I am interested in modelling conditional relative survival estimates. I am familiar with modelling relative survival in Stata, but I never worked with conditional models.
    I tried to find information on the internet how to model conditional relative survival in Stata, but I only found that it can be calculated by dividing the relative survival at (x+y) years after diagnosis by the relative survival at x years after diagnosis. Is it that easy? I expected it to be very difficult statistics. Does anyone have experience with modelling conditional relative survival in Stata? Maybe there is an existing Stata code for this?

    Thank you in advance.

    Regards,

    Marissa van Maaren

  • #2
    If you are interested in modelling, for example, survival conditional on surviving one year following diagnosis then just stset your data so that people do not enter (become at risk) until one year following diagnosis.

    Comment


    • #3
      Thank you Paul Dickman! That is indeed exactly where I am interested in. Is it really that simple? I heard that the statistics concerning conditional survival were very difficult... maybe they are referring to the outcomes as 'loss of life expectancy' and 'remaining life expectancy' (which are maybe more relevant outcomes for a patient than probabilities) where flexible parametric models are used. Again, thank you .

      Comment


      • #4
        Conditional survival is easier than 'loss of life expectancy', but estimating 'loss of life expectancy' is not that difficult either.

        My previous statement was not exactly correct. It was also brief and could possibly be interpreted in several ways so here are some more details. Let's assume we are talking about cancer patients and wish to estimate the relative survival at 5 years conditional on surviving 1 year. There are a number of approaches one can take:

        1. origin=date_of_diagnosis, enter=(date_of_diagnosis plus 1 year), estimate survival at time=5
        2. origin=enter=(date_of_diagnosis plus 1 year), estimate survival at time=4

        That is, in approach 1 the timescale is time since diagnosis but patients do not enter the riskset until 1 year post diagnosis. In approach two the risksets are the same as in approach 1 but we have redefined the timescale to be time since (dx+1 year). Some analysis approaches (e.g., life table estimates of survival using -strs-) will give identical estimates under both approaches whereas some will give different estimates. For example, using a flexible parametric model with approach 1 the predicted survival at time 1 will be less than 1 so it will give different estimates to approach 2 (where the predicted survival at the same timepoint will be 1 since it's now the time origin). I believe Poisson regression, where you split on time, will give the same estimates for both approaches.

        In your question you said you were interested in modelling conditional relative survival estimates, so my suggestion would be to use approach 2 since it will always work irrespective of analytic method.

        For completeness, if you just want to predict conditional relative survival (with confidence interval) then this is very easy to do after fitting a standard flexible parametric model (where all patients are at risk from diagnosis).

        Code:
        gen t1 = 1
        gen t5 = 5
        predictnl condsurv = predict(survival timevar(t5))/predict(survival timevar(t1)) ///
                ,ci(condsurv_lci condsurv_uci)
        Here is some fully worked code (credit to Paul Lambert for the code but blame me for errors).

        stpm2 is available from SSC (ssc install stpm2)
        strs is described in Stata Journal (net install st0376)

        The code uses sample data sets (colon and popmort) that are distributed with strs but are also available from my webpage (see below).

        http://www.pauldickman.com/rsmodel/s...on/popmort.dta
        http://www.pauldickman.com/rsmodel/s...olon/colon.dta

        We first predict both the 5-year and 1-year relative survival (unconditional) and divide them to get the conditional estimate. We then show how the (more complicated) code above gives the same estimates but we also get confidence limits.

        Code:
        use colon, clear
        
        stset surv_mm, failure(status=1,2) scale(12) id(id) exit(time 60.5)
        gen _age = min(int(age + _t),99)
        gen _year = int(yydx + _t)
        
        sort _year sex _age
        merge m:1 _year sex _age using popmort, keep(match master)
        keep if age<=90
        
        /* Model age using restricted cubic spline */        
        rcsgen age, gen(rcsage) df(4) orthog
        stpm2 rcsage1-rcsage4, scale(hazard) df(5) bhazard(rate)
        
        /* Predict one-year relative survival and plot as a function of age */
        gen t1 = 1
        predict s1, survival timevar(t1) ci
        twoway     (rarea s1_lci s1_uci age, sort) ///
                (line s1 age, sort lpattern(solid)) ///
                , legend(off) ytitle("1-year relative survival") scheme(sj) ///
                ylabel(0(0.2)1,angle(h) format(%3.1f)) name(s1,replace)
        
        /* Predict five-year relative survival and plot as a function of age */
        gen t5 = 5
        predict s5, survival timevar(t5) ci
        twoway     (rarea s5_lci s5_uci age, sort) ///
                (line s5 age, sort lpattern(solid)) ///
                , legend(off) ytitle("5-year relative survival") scheme(sj) ///
                ylabel(0(0.2)1,angle(h) format(%3.1f)) name(s5,replace)
        
        /* Conditional relative survival obtained as s5/s1 */        
        gen condsurv = s5/s1        
        twoway     (line condsurv age, sort lpattern(solid)) ///
                , legend(off) ytitle("5 year conditional relative survival") scheme(sj) ///
                ylabel(0(0.2)1,angle(h) format(%3.1f))  name(condsurv,replace)
        
        /* Conditional relative survival using predictnl */
        predictnl condsurv2 = predict(survival timevar(t5))/predict(survival timevar(t1)) ///
                ,ci(condsurv2_lci condsurv2_uci)        
        twoway     (rarea condsurv2_lci condsurv2_uci age, sort) ///
                (line condsurv2 age, sort lpattern(solid)) ///
                , legend(off) ytitle("5 year conditional relative survival") scheme(sj) ///
                ylabel(0(0.2)1,angle(h) format(%3.1f))  name(condsurv2,replace)










        Comment


        • #5
          Thank you for this incredibly detailed answer, Paul. It is much more clear to me now! I have one additional question: I see you specified df(4). Why did you choose 4? I've read in the Stata Journal that it must be between 1 and 10, but that a value between 1 and 5 usually is sufficient. On what information do you base this number? And does the number of knots has to be specified, or does stata in your code now uses some default number? I don't know how I have to justify a certain number.

          Regards, Marissa van Maaren
          Last edited by Marissa van Maaren; 02 Oct 2015, 06:53.

          Comment


          • #6
            In the code above I have modelled two quantities using restricted cubic splines, age and the log baseline cumulative hazard. In the commands that I have used, the df (which determines the number of knots) has to be explicitly specified. Other commands may be different. There are different views on how to select an appropriate number of knots, but I would suggest using a combination of comparing the AIC/BIC for different numbers of knots and simply looking at the fitted curves. You want to choose sufficient knots to capture the complexity of the underlying function (which of course is unknown) without overfitting. In general, I would suggest that in general it's better to overfit (e.g., choose 1 too many knots) rather than underfit. More care needs to be taken if your inference is based on assessing the shape of the fitted curve as opposed to, for example, you just want to adjust for the potential confounding effect of a variable.


            Rutherford MJ, Crowther MJ, Lambert PC. The use of restricted cubic splines to approximate complex hazard
            functions in the analysis of time-to-event data: a simulation study. Journal of Statistical Computation and
            Simulation. 2015;85:777-793.

            Comment


            • #7
              Very informative paper! Allright, I think I have enough information for now, thanks for your time!

              Comment

              Working...
              X