No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • stpm2 and 5-year net survival

    Dear Listers,

    I am using the -stpm2- command to run relative survival analysis. I am interested in estimating 1,3 and 5-year net survival.

    I know this can be done using the -strs- command although by typing something along these lines:

    strs using "pop.dta" if fail==1 , breaks(0(1)5) ///
    mergeby(_year sex _age) ///
    list(n d w cp cp_e2 cr_e1 cr_e2 cns_pp) ederer2 ///
    save(replace) notable

    Is there a way to do something equivalent within the -stpm2- command? Is it possible to get the estimated stratified by age, stage etc?

    I am using Stata 15.


  • #2
    Hi Laura,

    this is certainly possible to do... but there are quite a few choices to make before you can get to that stage.

    The main feature of -stpm2- that you need to utilise for the relative survival analysis is the bhazard() option - this will allow you to specify a variable containing the background mortality - typically after you have merged in data from a population mortality file (in a similar way as you are merging through the -strs- non-parametric approach). Details on this are given in the Stata Journal article that will be linked from the help file in the -stpm2- package - see Section 4.5 for Relative Survival models (P. C. Lambert and P. Royston. Further development of flexible parametric models for survival analysis. Stata Journal 2009;9:265-290). This will then ensure that your predicted survival functions following the model are then relative survival estimates even though you will have -stset- with an all-cause death indicator. More details on making predictions following a model are also given in the paper above.

    If you want to use a modelling approach, you will need to make some extra assumptions, the complexity of the baseline (excess) cumulative hazard using the df() option, and also the specification of your covariate effects - how to model age (in groups or continuously)? Whether all of your variables satisfy the proportional excess hazards assumption? etc. etc. This will typically not be true for stage, for instance, where the effects are likely to be larger in the short-term and then diminish. You could also truly stratify on variables and fit separate models in each of your groups, otherwise you may need to consider interaction effects.

    Depending on your choices above, the predictions will then follow on from that. Note the timevar() option of predict following -stpm2- which can be useful if you care about predictions at specific timepoints - see the postestimation help file of -stpm2-.

    P.S. I'd also be careful with the "if fail==1" in your -strs- - I don't know what that variable is or your data structure, but it looks a little odd to me, especially if that's your failure variable.


    • #3
      Hi Mark,

      Thanks a lot for your reply. I have now removed the 'if fail ==1', as you pointed out. Fail is my failure variable and removing it produces better estimates.

      I have had a look at the paper you suggest and had a look at the stpm2 post estimation in the help file; however, I find it somehow confusing.

      So far I have checked for all assumptions you suggest but not to see if there is an interaction term between sex and age. Before I do, I want to make sure I can correctly specify a simpler model and estimate relative survival.

      This is my code to explore differences in relative survival across men and women while adjusting for age (categorical).

      stset surv, id(patid) failure(fail==1) scale(365.25)
      stpm2 sex agec2 agec3 agec4 agec5 , df(4) scale(hazard) bhazard(rate) eform nolog tvc(sex agec2 agec3 agec4 agec5) dftvc(2)

      * to plot relative survival
      predict s1 , meansurv at(sex 1) timevar(_t)
      predict s2 , meansurv at(sex 2) timevar(_t)
      twoway (line s1 _t, sort) (line s2 _t, sort), ///
      ylab(0(0.2)1) ///
      ytit("Relative survival") xtit("Years since diagnosis") ///
      legend(order(1 "Males" 2 "Females"))

      However, I am still not sure how I can obtain the actual value of relative survival at 1 and 5 years as you would do in strs:

      strs using "popmort" , breaks(0(1)10) ///
      list(n d w cp cp_e2 cr_e1 cr_e2 cns_pp) ederer1 pohar ///
      mergeby(_year sex _age) by(sex) ///
      save(replace) notable

      u grouped, clear
      list cp cp_e2 cr_e2 lo_cr_e2 hi_cr_e2 if inlist(end,1,3,5)

      I'd be grateful if you could advice whether this can be done in stpm2 or whether I am better off using strs.

      Lastly, am I correct in assuming that in stpm2, relative survival is estimated in a way which is equivalent to the pohar option of strs?



      • #4
        Hi Laura,

        This looks good for a start with the -stpm2- model.

        By using the -predict, meansurv- option you are taking the marginal (average) value by effectively predicting estimates for the entire dataset and setting the value of sex to be males and females in your two calls to make the contrast. Using a timevar of _t is usually quite an inefficient thing to do as a consequence as you will be averaging your N estimates at each timepoint in your dataset, which may be repeated or may be very finely split. If you just want to make the contrast between males and females, and you then just want that at say 1 and 5 years then using a different timevar() option would do the trick:

        gen timevar_1_5=1 in 1
        replace timevar_1_5=5 in 2
        predict s1_new, meansurv (at sex 1) timevar(timevar_1_5)
        predict s2_new, meansurv (at sex 2) timevar(timevar_1_5)
        list s1_new s2_new timevar_1_5 if timevar_1_5!=.
        For the graph, you may want to use a shorter timevar again than using _t. The below will just make the prediction for 100 timepoints between 0 and 10.

        range timevar_new 0 10 100
        predict s1_new, meansurv (at sex 1) timevar(timevar_new)
        predict s2_new, meansurv (at sex 2) timevar(timevar_new)
        line s1_new s2_new timevar_new, sort
        On your last point - the short answer is no. The pohar option in -strs- is very much a non-parametric estimate, if your model is correct and you have made the appropriate assumptions with respect to capturing the shape of the baseline and incorporating time-dependent effects, then the estimates should be close to one another. But the approach is very different. The benefit of fitting the model is that you could now make many more predictions for your covariate patterns than would be possible with the non-parametric approach; but you are relying on your model being "good enough" to do that.


        • #5
          Hi Mark,

          Thanks a lot for your detailed response, it is very helpful. Could you direct me to a reference which explain in more detail, the way relative survival is calculated in stpm2 as I would like to give more information in my methods' section.