Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compare survival of a cohort with the age and gender matched population

    Dear all,

    I am studying the survival of a cohort of patients who received a specific treatment. I would like to compare their survival with that of an age and gender matched population from Belgium. I found the mortality rate of Belgium in the WHO website. However I do not know how I should proceed. How do I convert the mortality rate of the general population (which changes grossly every 5 years) in a survival curve? Further, how do I do the age and gender match? Should I take for every patient of my cohort a match of same age and gender from the general population or should I just use the mean age of the cohort and match with the same age of the general population? And then how do I put this curve together with my cohort survival curve?
    Thank you in advance, every help is greatly appreciated.

    Regards

  • #2
    In order for anyone to answer your question, they will need more information about the data. Show us an informative sample of your patient cohort data, and an informative sample of the Belgium data. Do this by -list-ing a reasonable number of observations from each data set and copying and pasting the Stata results into a code block. (Select the underlined A button on the forum's editor, and then click on the # button to create a code block. Paste between the delimiters that appear.)

    Comment


    • #3
      You probably should not use this forum for advertising. On the other hand: In An Introduction to Stata for Health Researchers (Stata Press) we in section 13.8 describe indirect standardization, which is what you are asking about.

      Comment


      • #4
        Thank you very much Clyde Schechter for your help. The dataset of my cohort is pretty easy: I have for each patient an indicator for the time since the treatment and an indicator for the status (whether censored or dead). With this I am doing my survival analysis and building my survival curve (with stset etc..).

        On the other hand the life table of the general Belgian population looks like this: [
        AGE15-19 15-19 BTSX Both sexes 0.001
        AGE15-19 15-19 BTSX Both sexes 0.001
        AGE15-19 15-19 BTSX Both sexes 0.000
        AGE20-24 20-24 BTSX Both sexes 0.001
        AGE20-24 20-24 BTSX Both sexes 0.001
        AGE40-44 40-44 BTSX Both sexes 0.001
        AGE45-49 45-49 BTSX Both sexes 0.003
        AGE45-49 45-49 BTSX Both sexes 0.003
        AGE45-49 45-49 BTSX Both sexes 0.002
        AGE70-74 70-74 BTSX Both sexes 0.028
        AGE70-74 70-74 BTSX Both sexes 0.021
        AGE75-79 75-79 BTSX Both sexes 0.054
        AGE75-79 75-79 BTSX Both sexes 0.044
        AGE100+ 100+ BTSX Both sexes 0.510
        AGE100+ 100+ BTSX Both sexes 0.483
        AGE100+ 100+ BTSX Both sexes 0.460
        ][/CODE]
        where the last column is the death rate (there rare 3 values for each age group corresponding to the death rate calculated in 1990-2000-2012).
        So, I would like to add the survival curve of the matched population to the survival curve of my cohort and check if they are indeed different. Do you have any suggestions? Thanks a lot

        Comment


        • #5
          I find the Belgian life table confusing: why are there three rows for each age group that are identical except for the final column. And what is the final column: is it an annual mortality probability, or what?

          Comment


          • #6
            The 3 rows are the estimates for the years 1990-2000-2012 for each age group; the last column is indeed the death rate. These are the data reported in the WHO website (http://apps.who.int/gho/data/view.main.LT61950?lang=en).
            So, I know that survival is 1- death rate but how which command should I use in STATA to get a survival curve? And how can I match with my cohort?
            Thanks everyone for help

            Comment


            • #7
              The death rate in the final column: is that annual probability of death or, less likely, probability of dying before the end of the age-group? Let me assume the former. The next step is to -reshape- your data so that the three years, 1990, 2000, and 2012 are separate variables: it makes no sense to combine them into a single survival curve. And you will need to also find out what the death rates are between 25 and 40, 50 and 69, and 80 and 99: those seem to be absent. (I don't know what the upper age is for the 100+ group, 110 or 120 is typical--you should find out.) The next step will be to convert your age range into a pair of numeric variables, lower to upper age. So I'm going to assume that you have already re-arranged your data to the following layout:
              Code:
              age_group   lower_age   upper_age   m1990 m2000 m2012
              Then you can generate survival curves for 1990, 2000, and 2012 as follows:
              Code:
              foreach y of numlist 1990 2000 2012 {
                  // CALCULATE PROBABILITY OF DYING DURING AGE PERIOD
                  gen period_m`y' = 1 - (1-m`y')^(upper_age-lower_age+1)
              }
                 //  NOW CALCULATE KAPLAN-MEIR ESTIMATOR OF SURVIVAL FUNCTION
                 sort lower_age
                 gen S`y' = 1
                foreach y of numlist 1990 2000 2012 {
                  replace S`y' = S`y'[_n-1]*(1-period_m`y'[_n-1]) if _n > 1
              }
              These survival curves will be step functions due to the width of the age-groups for which you have death rates. If you had 1 year death rates by single year ages you could get something that would be a bit smoother.

              Comment


              • #8
                Dear Clyde I am sorry to bother you again but I have tried the code you gave me and I am still having problems. I reshaped the data as follow:

                age_group lower_age upper_age m1990 m2000 m2012
                <1 0 1 .008 .005 .003
                1-4 1 4 0 0 0
                5-9 5 9 0 0 0
                10-14 10 14 0 0 0
                15-19 15 19 .001 .001 0
                20-24 20 24 .001 .001 0
                25-29 25 29 .001 .001 0
                30-34 30 34 .001 .001 .001
                35-39 35 39 .001 .001 .001
                40-44 40 44 .002 .002 .001
                45-49 45 49 .003 .003 .002
                50-54 50 50 .005 .005 .004
                55-59 55 59 .008 .007 .006
                60-64 60 60 .012 .01 .009
                65-69 65 69 .02 .017 .013
                70-74 70 70 .032 .028 .021
                75-79 75 79 .054 .044 .034
                80-84 80 80 .09 .078 .063
                85-89 85 89 .148 .132 .115
                90-94 90 90 .234 .217 .199
                95-99 95 99 .345 .323 .316
                >100 100 120 .516 .483

                then I use the code but it gives me back an error:

                [CODE]. foreach y of numlist 1990 2000 2012 {
                2.
                . gen period_m`y' = 1 - (1-m`y')^(upper_age-lower_age+1)
                3. }
                (1 missing value generated)

                .
                . sort lower_age

                . gen S`y' = 1

                . foreach y of numlist 1990 2000 2012 {
                2. replace S`y' = S`y'[_n-1]*(1-period_m`y'[_n-1]) if _n > 1
                3. }
                variable S1990 not found
                r(111);
                /CODE]

                any insight again?

                thanks a lot

                SM

                Comment


                • #9
                  Sorry, my mistake. The gen S`y' = 1 statement needs to be inside, and the first command of, the second foreach loop, thus:

                  Code:
                  foreach y of numlist 1990 2000 2012 {
                      // CALCULATE PROBABILITY OF DYING DURING AGE PERIOD
                      gen period_m`y' = 1 - (1-m`y')^(upper_age-lower_age+1)
                  }
                     //  NOW CALCULATE KAPLAN-MEIR ESTIMATOR OF SURVIVAL FUNCTION
                     sort lower_age
                     
                    foreach y of numlist 1990 2000 2012 {
                      gen S`y' = 1
                      replace S`y' = S`y'[_n-1]*(1-period_m`y'[_n-1]) if _n > 1
                  }

                  Comment


                  • #10
                    Have a look at this post for another approach (using -strs-). You can get general population mortality rates in 1-year ages for the Belgian population from mortality.org. Later in the thread I linked to you can see a description of how to transform the data from mortality.org into the format required by -strs-.

                    Comment


                    • #11
                      Thank you very much Clyde Schechter, it works fine now.

                      Thank you Paul Dickman, it took me some time to go through all the steps but it is a very nice approach. However when I draw the survival curve of my cohort with sts graph it looks a bit different compared to the graph I get using your code, have a look please:


                      this is the full cohort with sts grapgh

                      and this is the graph I got with the code you wrote:

                      strs using BelgianPopMort, br(0(0.5)20) mergeby(_year male _age) by(male agegroup) notables save(replace)

                      use grouped if male==1 & agegroup==3, clear

                      twoway (line cp end, lw(medthick)) (line cp_e2 end, lw(medthick)), yti("Survival") ylabel(0(0.2)1, format(%3.1f)) xti("Years from diagnosis") xla(0(1)20) legend(order(1 "Observed" 2 "Expected") ring(0) pos(7) col(1))




                      why is it? And (I am sorry but I am a beginner with STATA) how can I draw a smoother observed curve?

                      Thanks a lot

                      SM

                      Comment


                      • #12
                        Hi Stefano,

                        I couldn't see the graphs in your post, but here are some general comments.

                        sts graph, by default, estimates the survivor function using the Kaplan-Meier method whereas -strs- uses the actuarial (life table) method. The two estimates should be very similar. Both methods effectively divide the follow-up time into subintervals, the difference being the Kaplan-Meier method creates a new subinterval at each event time whereas for the life table approach they are pre-specified (6 months in your example). As such, you can make the approaches more similar (and the observed curve more smooth) by specifying shorter intervals.

                        Be aware the -strs- requires time to be in years, so use scale(365.24) when you stsplit if you have time in days. This is because the expected survival proportions are also specified in days.

                        If you can't find the source of the difference then have a look at the table of survival estimates. sts list will give you the Kaplan-Meier estimates and removing the "notables" option for -strs- will show the lifetables. Both tables will show the number at risk, number of events, etc. and you should be able top spot where the difference is.

                        Paul

                        Comment


                        • #13
                          Hi Paul,

                          thank you very much again for your help. I also realized that I was using the if option, therefore I was restricting y analysis to a specific subset while I was interested in the full cohort.
                          I have a last question and hope to bother you no more. How can I finally test if the 2 curves are statistically different, like with the Log-rank test? I looked into the models.do file and seems quite difficult for me… Thanks in advance

                          Stefano

                          Comment


                          • #14
                            An easy way to compare the two curves is to look at the relative survival ratio (RSR), which is simply the ratio of the two curves (observed/expected). This is calculated by strs (together with 95% confidence intervals) and saved in the grouped data file. strs calculates both the cumulative RSR (stored in cr_e2) and the conditional RSR (stored in r). When studying the survival of cancer patients (the typical application of relative survival) it's very common to use the 5-year RSR as a summary of survival.

                            If your patients have the same survival as the general population then RSR will be 1. If the 95% CI for the cumulative RSR does not contain 1 then this is evidence of a statistically significant difference.

                            You could look at the estimates and CIs in the life tables or use the following code to plot the cumulative RSR based on the data save to grouped.dta.

                            Code:
                            twoway (rarea lo_cr_e2 hi_cr_e2 end, sort) ///
                                    (line cr end, sort), ///
                                    yti("Relative Survival") legend(off) ///
                            ylabel(0(0.2)1, format(%3.1f)) ///
                            xti("Years from diagnosis") xla(0(1)10)
                            For conditional RSR, substitute r, lo_r and hi_r. I find the plots of the conditional RSR very informative since they show at what point in the follow-up the differences occur. For example, you might find the RSR is initially less than 1 but then after some time it returns to 1 (the surviving patients now have the same mortality as a comparable general population).

                            For an example of this type of analysis, have a look at Hultcrantz et al, J Clin Oncol. 2012 30(24):2995-3001.

                            Note that there is a slight difference between this application and, for example, comparing the survival of patients in two treatment arms. In your application (as is standard in relative survival) we are assuming that the expected survival is fixed and known (i.e., no random error). Like most assumptions made in statistics, we know this is not perfectly true but we are willing to make the assumption.

                            The modelling is not as hard as it looks (the code in model.do compares different ways of modelling and you only need one in practice). If all you want to do is compare observed to expected for your cohort then you can do without modelling. If, however, you want to compare, for example, if RSR differs across treatments (or by sex or agegroup) then you will need modelling.

                            Comment


                            • #15
                              Thanks Paul, it is just awesome!

                              Comment

                              Working...
                              X