stpm2 and 5-year net survival

Laura Myles

Join Date: Jun 2018

Posts: 164
#1

stpm2 and 5-year net survival

20 Jul 2018, 10:23

Dear Listers,

I am using the -stpm2- command to run relative survival analysis. I am interested in estimating 1,3 and 5-year net survival.

I know this can be done using the -strs- command although by typing something along these lines:

strs using "pop.dta" if fail==1 , breaks(0(1)5) ///
mergeby(_year sex _age) ///
list(n d w cp cp_e2 cr_e1 cr_e2 cns_pp) ederer2 ///
save(replace) notable

Is there a way to do something equivalent within the -stpm2- command? Is it possible to get the estimated stratified by age, stage etc?

I am using Stata 15.

Thanks,
Laura
Tags: None
Mark Rutherford

Join Date: Jul 2014

Posts: 7
#2

23 Jul 2018, 05:59

Hi Laura,

this is certainly possible to do... but there are quite a few choices to make before you can get to that stage.

The main feature of -stpm2- that you need to utilise for the relative survival analysis is the bhazard() option - this will allow you to specify a variable containing the background mortality - typically after you have merged in data from a population mortality file (in a similar way as you are merging through the -strs- non-parametric approach). Details on this are given in the Stata Journal article that will be linked from the help file in the -stpm2- package - see Section 4.5 for Relative Survival models (P. C. Lambert and P. Royston. Further development of flexible parametric models for survival analysis. Stata Journal 2009;9:265-290). This will then ensure that your predicted survival functions following the model are then relative survival estimates even though you will have -stset- with an all-cause death indicator. More details on making predictions following a model are also given in the paper above.

If you want to use a modelling approach, you will need to make some extra assumptions, the complexity of the baseline (excess) cumulative hazard using the df() option, and also the specification of your covariate effects - how to model age (in groups or continuously)? Whether all of your variables satisfy the proportional excess hazards assumption? etc. etc. This will typically not be true for stage, for instance, where the effects are likely to be larger in the short-term and then diminish. You could also truly stratify on variables and fit separate models in each of your groups, otherwise you may need to consider interaction effects.

Depending on your choices above, the predictions will then follow on from that. Note the timevar() option of predict following -stpm2- which can be useful if you care about predictions at specific timepoints - see the postestimation help file of -stpm2-.

P.S. I'd also be careful with the "if fail==1" in your -strs- - I don't know what that variable is or your data structure, but it looks a little odd to me, especially if that's your failure variable.
Comment
Laura Myles

Join Date: Jun 2018

Posts: 164
#3

23 Jul 2018, 10:22

Hi Mark,

Thanks a lot for your reply. I have now removed the 'if fail ==1', as you pointed out. Fail is my failure variable and removing it produces better estimates.

I have had a look at the paper you suggest and had a look at the stpm2 post estimation in the help file; however, I find it somehow confusing.

So far I have checked for all assumptions you suggest but not to see if there is an interaction term between sex and age. Before I do, I want to make sure I can correctly specify a simpler model and estimate relative survival.

This is my code to explore differences in relative survival across men and women while adjusting for age (categorical).

stset surv, id(patid) failure(fail==1) scale(365.25)
stpm2 sex agec2 agec3 agec4 agec5 , df(4) scale(hazard) bhazard(rate) eform nolog tvc(sex agec2 agec3 agec4 agec5) dftvc(2)

* to plot relative survival
predict s1 , meansurv at(sex 1) timevar(_t)
predict s2 , meansurv at(sex 2) timevar(_t)
twoway (line s1 _t, sort) (line s2 _t, sort), ///
ylab(0(0.2)1) ///
ytit("Relative survival") xtit("Years since diagnosis") ///
legend(order(1 "Males" 2 "Females"))

However, I am still not sure how I can obtain the actual value of relative survival at 1 and 5 years as you would do in strs:

strs using "popmort" , breaks(0(1)10) ///
list(n d w cp cp_e2 cr_e1 cr_e2 cns_pp) ederer1 pohar ///
mergeby(_year sex _age) by(sex) ///
save(replace) notable

u grouped, clear
list cp cp_e2 cr_e2 lo_cr_e2 hi_cr_e2 if inlist(end,1,3,5)

I'd be grateful if you could advice whether this can be done in stpm2 or whether I am better off using strs.

Lastly, am I correct in assuming that in stpm2, relative survival is estimated in a way which is equivalent to the pohar option of strs?

Thanks,
Laura
Comment
Mark Rutherford

Join Date: Jul 2014

Posts: 7
#4

24 Jul 2018, 02:26

Hi Laura,

This looks good for a start with the -stpm2- model.

By using the -predict, meansurv- option you are taking the marginal (average) value by effectively predicting estimates for the entire dataset and setting the value of sex to be males and females in your two calls to make the contrast. Using a timevar of _t is usually quite an inefficient thing to do as a consequence as you will be averaging your N estimates at each timepoint in your dataset, which may be repeated or may be very finely split. If you just want to make the contrast between males and females, and you then just want that at say 1 and 5 years then using a different timevar() option would do the trick:

Code:

gen timevar_1_5=1 in 1 replace timevar_1_5=5 in 2 predict s1_new, meansurv (at sex 1) timevar(timevar_1_5) predict s2_new, meansurv (at sex 2) timevar(timevar_1_5) list s1_new s2_new timevar_1_5 if timevar_1_5!=.

For the graph, you may want to use a shorter timevar again than using _t. The below will just make the prediction for 100 timepoints between 0 and 10.

Code:

range timevar_new 0 10 100 predict s1_new, meansurv (at sex 1) timevar(timevar_new) predict s2_new, meansurv (at sex 2) timevar(timevar_new) line s1_new s2_new timevar_new, sort

On your last point - the short answer is no. The pohar option in -strs- is very much a non-parametric estimate, if your model is correct and you have made the appropriate assumptions with respect to capturing the shape of the baseline and incorporating time-dependent effects, then the estimates should be close to one another. But the approach is very different. The benefit of fitting the model is that you could now make many more predictions for your covariate patterns than would be possible with the non-parametric approach; but you are relying on your model being "good enough" to do that.
2 likes
Comment
Laura Myles

Join Date: Jun 2018

Posts: 164
#5

26 Jul 2018, 04:16

Hi Mark,

Thanks a lot for your detailed response, it is very helpful. Could you direct me to a reference which explain in more detail, the way relative survival is calculated in stpm2 as I would like to give more information in my methods' section.

Thanks,
Laura
Comment
Laura Myles

Join Date: Jun 2018

Posts: 164
#6

06 Aug 2019, 08:05

Hi Mark,

I have a question related to your code above:
range timevar_new 0 10 100 predict s1_new, meansurv (at sex 1) timevar(timevar_new) I have data for 20-year follow-up but for the older patients (85+), they are all dead after 5 years of diagnosis. The KM plot for overall survival rightly plots no curve after 5 years and shows number at risk is 0. However, predicting survival for this age group (predict s1_new, meansurv (at ageg 85) timevar(timevar_new) ) still produces mean survival - is this simply extrapolated beyond observation by the splines? Is the model OK? A colleague suggested censoring the data after 5 years for all participants but I am thinking this is not necessary if the results from predict s1 are just a result of the way timevar is set. Thanks!
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#7

06 Aug 2019, 15:27

Yes, as you surmised, you are obtaining model-based predictions of net survival up to 10 years in each age group even if you don't have individuals at risk for the entire 10 years in each age group. I would advise caution in predicting outside the range of the observed data.

There's no reason to enforce the same follow-up for all age groups. That is, there's no need to censor the younger patients at 5 years just because the eldest patients can't be followed for longer.

In addition to the usual concerns with out of sample predictions, there are issues with the interpretation of long-term net survival for the elderly (the hypothetical scenario where one cannot die of causes other than cancer is very far from reality).
Comment
Laura Myles

Join Date: Jun 2018

Posts: 164
#8

07 Aug 2019, 03:16

Hi Paul,

thanks for your reply. I have a follow-up question: the model is able to predict survival estimates outside the range of the observed data, does this affect the coefficient estimates (e.g. EMRR) from the model? I ran a model with 5-year censoring and one where all follow-up data are included and the estimated coefficients are very similar.

I am assuming I have 2 options to proceed following your reply:

1) censor older people (80+) at 5 years but include the whole follow-up for younger people This could also be justified by your last comment and could be achieved:

g end1 = dod //death
g end2= date("31/12/2017", "DMY") // end of study
g end3 = d_index2+(5*365.25) //censor at 5 years if 80+
replace end3 = . if age<80
egen exit = rowmin(end1 end2 end3)

2) if the coefficients are reliable even including all follow-up for older patients. Do not apply any censoring and include all follow-up data, only predict survival up to 5 years for all age groups (or predict for 10 years but only plot the first 5 years in the older age group).

egen exit = rowmin(end1 end2) //no censoring

stpm2 age2 age3 age4 age5 age6 age7, df(5) scale(hazard) bhazard(rate) eform

range timevar 0 5 100
predict s1 , meansurv at(age1 1) timevar(timevar)
predict s2 , meansurv at(age2 1) timevar(timevar)
etc...

Are these approaches sensible?
Comment
Paul Dickman

Join Date: Apr 2014

Posts: 294
#9

07 Aug 2019, 08:55

I suggest option 2. If you explicitly exclude observations or person-time from your analysis that would not otherwise contribute to the likelihood then your estimates will not differ.

However, if you only want to predict 5 year survival then I would only fit the model to the first 5 years (maybe 5.5 years to get better estimates at 5 years). If you fit the model to 10 years of follow-up then you are, for example, assuming proportional hazards over a 10 year follow-up. This is typically a more rigid assumption then assuming PH over a 5 years follow-up (which is still a strong assumption).
1 like
Comment
Laura Myles

Join Date: Jun 2018

Posts: 164
#10

07 Aug 2019, 09:49

Hi Paul,

Thanks for this. Would it be wrong to go for option 2 but only predict survival for 5 years to produce a graph to visually present the results.

Are there any relevant papers I should be reading? I am still unsure why the model is able to predict survival beyond observed points and would like to understand how.

I will check the PH assumption.

L
Comment

Announcement

stpm2 and 5-year net survival

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment