Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to predict survival curve or risk beyond study time after Cox or parametric models?

    Dear Stata Users,

    Normally, predicting survival curves within study time after Cox or any parametric models is pretty straight forward. Yet, I'm interested in predicting survival curves or risks beyond study time. I searched extensive Stata materials and did not find any.

    Here is my data:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(time remission group id) double(logwbc sex lwbc3) float trt
     6 1 0  3 2.31 0 2 1
     6 0 0 10  3.2 0 3 1
     6 1 0  2 3.28 0 3 1
     6 1 0  1 4.06 1 3 1
     7 1 0  4 4.43 0 3 1
     9 0 0 11  2.8 0 2 1
    10 1 0  5  2.7 0 2 1
    10 0 0 12 2.96 0 2 1
    11 0 0 13  2.6 0 2 1
    13 1 0  6 2.88 0 2 1
    16 1 0  7  3.6 1 3 1
    17 0 0 14 2.16 0 1 1
    19 0 0 15 2.05 0 1 1
    20 0 0 16 2.01 1 1 1
    22 1 0  8 2.32 1 2 1
    23 1 0  9 2.57 1 2 1
    25 0 0 17 1.78 1 1 1
    32 0 0 18  2.2 1 1 1
    32 0 0 19 2.53 1 2 1
    34 0 0 20 1.47 1 1 1
    35 0 0 21 1.45 1 1 1
     1 1 1 23  2.8 1 2 0
     1 1 1 22    5 1 3 0
     2 1 1 24 4.48 1 3 0
     2 1 1 25 4.91 1 3 0
     3 1 1 26 4.01 1 3 0
     4 1 1 27 2.42 1 2 0
     4 1 1 28 4.36 1 3 0
     5 1 1 29 3.49 1 3 0
     5 1 1 30 3.97 0 3 0
     8 1 1 33 2.32 0 2 0
     8 1 1 31 3.05 0 3 0
     8 1 1 32 3.26 1 3 0
     8 1 1 34 3.52 0 3 0
    11 1 1 35 2.12 0 1 0
    11 1 1 36 3.49 0 3 0
    12 1 1 38  1.5 0 1 0
    12 1 1 37 3.06 0 3 0
    15 1 1 39  2.3 0 1 0
    17 1 1 40 2.95 0 2 0
    22 1 1 41 2.73 0 2 0
    23 1 1 42 1.97 1 1 0
    end
    The data includes the following variables:
    • time: survival time
    • remission: 0=censored; 1=cancer relapse
    • group: 0=placebo; 1=treatment
    • logwbc: log-transformed number of white blood cells
    • sex: 1=male; 0=female
    I used my codes below and not sure if it is correct or not and wondering if there is any other method to serve this prediction beyond study time.

    Code:
    stset time, failure(remission) //time in weeks
    sum _t //the maximum survival time of the study = 35 weeks
    
    stcox trt logwbc sex, nohr nolog
    predict double xbeta, xb //calculate each individual overall hazard coefficient
    predict double basesurv, basesurv //predict each individual survival curve at baseline.
    
    gen newtime=_t+35 //I want to predict risk of cancer relapse at week 70, so I generated this variable
    sum newtime ////the maximum survival time of the study is now 70 weeks
    
    sum basesurv if newtime<70
    gen risk70weeks=1 - r(min)^exp(xbeta)
    
    sum risk70weeks //risk of cancer relapse at week 70 ranges from 21.76 to 100%.
    I'm grateful and looking forward to any of your helps.
    Last edited by Huy Nguyen; 20 Dec 2022, 22:38.

  • #2
    After a semi-parametric model like Cox proportional hazards, it is not possible to make predictions outside the range of the study times. The only thing you can say is that the survival probability after the close of the study will be at most what it was at the end of the study. But other than that, it could be anything. I'm not sure what, if anything, the results of the calculations in #1 mean, but they are definitely not out of study time range predictions of the survival function.

    When a parametric model has been used, it is a different story, and the -predict- command can be used to predict survival functions or hazards out of sample. Whether it is conceptually legitimate to do that is another matter altogether: it is a general statistical principle that extrapolation is perilous and you do it at your own risk. But if you want to do it, you can.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      it is a general statistical principle that extrapolation is perilous and you do it at your own risk.
      That is worth repeating.

      A general principle for out of sample prediction, that works for most good software (including Stata), is that you can add observations to the dataset with values of the covariates at which you wish to predict but set the outcome to missing. Because the outcome is missing, these observations are not included when estimating the model parameters but postestimation commands will usually provide predictions for those observations. I've not tried it with stcox, but you would need to create new observations with the values of _t at which you wanted to predict along with the relevant covariate values.

      It's much easier with parametric models. stpm2 (available from SSC) has a timevar option where you can specify the values at which one wishes to predict.
      See https://pclambert.net/software/stpm2/stpm2_timevar/

      Here's another approach, where we create a second data set containing the covariate patterns and values of time for which we wish to predict.
      https://pauldickman.com/software/sta...tion_new_data/

      Comment


      • #4
        Thanks Clyde Schetcher and Paul Dickman so much for your useful explanations and resources. I read your links carefully and started to do Stata coding, and now able to digest and apply stpm2. Yet, can I have just two more questions. Since restricted cubic spline (RCS) is kind of transformed, rather than subset data, interpretation is not straight forward.

        I include here the codes and results as per the link provided by Paul.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int(pid year) float rf byte rfi float mf byte mfi float os byte age int pr byte hormon double pr_1 byte(_st _d) double _t byte _t0
          1 1992  59.10472 0  59.10472 0  59.10472 74   35 0   3.58351893845611 1 0  4.925393422444661 0
          2 1984  92.91171 0  92.91171 0  92.91171 79   36 0 3.6109179126442243 1 0  7.742642084757487 0
          3 1983  197.5195 0  197.5195 0  197.5195 44  138 0 4.9344739331306915 1 0                 10 0
          4 1985  86.20944 0  86.20944 0  86.20944 70    0 0                  0 1 0  7.184120178222656 0
          5 1983 161.47844 0 161.47844 0 161.47844 75  260 0  5.564520407322694 1 0                 10 0
          6 1983  193.4456 0  193.4456 0  193.4456 52  139 0  4.941642422609304 1 0                 10 0
          7 1993  81.83984 0  81.83984 0  81.83984 40   13 0 2.6390573296152584 1 0  6.819986343383789 0
          8 1988 136.34497 0 136.34497 0 136.34497 53    1 0  .6931471805599453 1 0                 10 0
          9 1988 128.75565 0 128.75565 0 128.75565 60  627 0 6.4425401664681985 1 0                 10 0
         10 1988  119.8193 0  119.8193 0  119.8193 52  316 0   5.75890177387728 1 0  9.984942118326822 0
         11 1989 70.078026 0 70.078026 0 70.078026 66    0 0                  0 1 0  5.839835484822591 0
         12 1988 136.21355 0 136.21355 0 136.21355 42  245 0 5.5053315359323625 1 0                 10 0
         13 1992  69.78234 0  69.78234 0  69.78234 74  317 0  5.762051382780177 1 0  5.815195083618164 0
         14 1993  58.25051 0  58.25051 0  58.25051 55    0 1                  0 1 0  4.854209582010905 0
         15 1989  88.11499 0  88.11499 0  88.11499 57    0 0                  0 1 0  7.342915852864583 0
         16 1993  85.12526 0  85.12526 0  85.12526 49  286 0  5.659482215759621 1 0  7.093771616617839 0
         17 1993  89.82341 0  89.82341 0  89.82341 61    0 1                  0 1 0  7.485284169514974 0
         18 1990  78.16016 0  78.16016 0  78.16016 52    6 0 1.9459101490553132 1 0  6.513346989949544 0
         19 1979 230.86653 0 230.86653 0 230.86653 51   14 0   2.70805020110221 1 0                 10 0
         20 1993  82.06982 0  82.06982 0  82.06982 62   22 0 3.1354942159291497 1 0  6.839151382446289 0
         21 1993 27.958933 0 27.958933 0 27.958933 57    4 1 1.6094379124341003 1 0 2.3299110730489097 0
         22 1990 122.08624 0 122.08624 0 122.08624 47  142 0  4.962844630259907 1 0                 10 0
         23 1992  84.89528 0  84.89528 0  84.89528 61  354 0  5.872117789475416 1 0  7.074606577555339 0
         24 1987  153.2649 0  153.2649 0  153.2649 52  649 0  6.476972362889683 1 0                 10 0
         25 1993  93.14169 0  93.14169 0  93.14169 52  147 0  4.997212273764115 1 0  7.761807123819987 0
         26 1986 114.39835 0 114.39835 0 114.39835 63   24 0 3.2188758248682006 1 0  9.533196131388346 0
         27 1990  109.7002 0  109.7002 0  109.7002 61    0 0                  0 1 0  9.141683578491211 0
         28 1988 134.37372 0 134.37372 0 134.37372 60  556 0  6.322565239927284 1 0                 10 0
         29 1993  94.62012 0  94.62012 0  94.62012 32  665 0  6.501289670540389 1 0  7.885010401407878 0
         30 1988 131.31827 0 131.31827 0 131.31827 38  305 0  5.723585101952381 1 0                 10 0
         31 1985   174.423 0   174.423 0   174.423 45  114 0   4.74493212836325 1 0                 10 0
         32 1988  94.35729 0  94.35729 0  94.35729 56   14 0   2.70805020110221 1 0  7.863107681274414 0
         33 1993 33.675564 0 33.675564 0 33.675564 63  414 0  6.028278520230698 1 0  2.806296984354655 0
         34 1989 129.24846 0 129.24846 0 129.24846 44  163 0  5.099866427824199 1 0                 10 0
         35 1988 125.79877 0 125.79877 0 125.79877 53   19 0  2.995732273553991 1 0                 10 0
         36 1990    84.731 0    84.731 0    84.731 54    7 0 2.0794415416798357 1 0  7.060916900634766 0
         37 1988  85.51951 0  85.51951 0  85.51951 73 1776 0  7.482681828154651 1 0  7.126625696818034 0
         38 1982 189.73306 0 189.73306 0 189.73306 45   55 0   4.02535169073515 1 0                 10 0
         39 1988 109.04312 0 109.04312 0 109.04312 66  115 0 4.7535901911063645 1 0  9.086926778157553 0
         40 1989  50.39836 0  50.39836 0  79.37577 70  391 0  5.971261839790462 1 0  4.199863115946452 0
         41 1988  44.94456 0  44.94456 0  73.10062 81    6 0 1.9459101490553132 1 0 3.7453797658284507 0
         42 1988 133.78233 0 133.78233 0 133.78233 61   11 0 2.4849066497880004 1 0                 10 0
         43 1993  86.70226 0  86.70226 0  86.70226 37    4 0 1.6094379124341003 1 0   7.22518793741862 0
         44 1991  83.48254 0  83.48254 0  83.48254 78  101 0  4.624972813284271 1 0  6.956878662109375 0
         45 1983 129.97125 0 129.97125 0 129.97125 72   15 0  2.772588722239781 1 0                 10 0
         46 1993  87.65503 0  87.65503 0  87.65503 47  605 0  6.406879986069314 1 0  7.304585774739583 0
         47 1989 124.94456 0 124.94456 0 124.94456 47  957 0   6.86484777797086 1 0                 10 0
         48 1991  72.57495 0  72.57495 0  72.57495 59   69 1  4.248495242049359 1 0   6.04791259765625 0
         49 1993  67.35113 0  67.35113 0  67.35113 64  541 0  6.295266001439646 1 0   5.61259396870931 0
         50 1983 166.50513 0 166.50513 0 166.50513 68    0 0                  0 1 0                 10 0
         51 1989 121.85626 0 121.85626 0 121.85626 55  101 0  4.624972813284271 1 0                 10 0
         52 1992  98.82546 0  98.82546 0  98.82546 29    4 0 1.6094379124341003 1 0  8.235455195109049 0
         53 1987 124.15606 0 124.15606 0 124.15606 74 1303 0 7.1731917424865985 1 0                 10 0
         54 1993  94.58727 0  94.58727 0  94.58727 49  453 0  6.118097198041348 1 0  7.882272720336914 0
         55 1991  96.06571 0  96.06571 0  96.06571 55  121 0  4.804021044733257 1 0  8.005475997924805 0
         56 1989  96.13142 0  96.13142 0  96.13142 50  465 0  6.144185634125646 1 0  8.010951360066732 0
         57 1991 106.71047 0 106.71047 0 106.71047 56   47 0  3.871201010907891 1 0  8.892539342244467 0
         58 1992  61.40452 0  61.40452 0  61.40452 69    5 1  1.791759469228055 1 0  5.117043177286784 0
         59 1988 123.07187 0 123.07187 0 123.07187 61    7 0 2.0794415416798357 1 0                 10 0
         60 1991  90.44764 0  90.44764 0  90.44764 47    0 0                  0 1 0  7.537303288777669 0
         61 1991   60.4846 0   60.4846 0   60.4846 35  153 0 5.0369526024136295 1 0  5.040383338928223 0
         62 1989 132.69815 0 132.69815 0 132.69815 42    0 0                  0 1 0                 10 0
         63 1989  125.2074 0  125.2074 0  125.2074 59  581 0  6.366470447731438 1 0                 10 0
         64 1991  79.34292 0  79.34292 0  79.34292 40    0 0                  0 1 0  6.611909866333008 0
         65 1992  61.40452 0  61.40452 0  61.40452 71   13 0 2.6390573296152584 1 0  5.117043177286784 0
         66 1988 134.17659 0 134.17659 0 134.17659 60   59 0    4.0943445622221 1 0                 10 0
         67 1984 169.65913 0 169.65913 0 169.65913 47    4 0 1.6094379124341003 1 0                 10 0
         68 1991  83.21971 0  83.21971 0  83.21971 55  241 0  5.488937726156687 1 0  6.934975941975911 0
         69 1988  83.71252 0  83.71252 0  91.92608 68    1 0  .6931471805599453 1 0  6.976043701171875 0
         70 1990  114.7269 0  114.7269 0  114.7269 42  230 0  5.442417710521793 1 0  9.560574849446615 0
         71 1983  124.9774 0  124.9774 0  124.9774 67  244 0  5.501258210544727 1 0                 10 0
         72 1992  59.56468 0  59.56468 0  59.56468 44   14 0   2.70805020110221 1 0  4.963723500569661 0
         73 1986 119.16222 0 119.16222 0 119.16222 62   18 0 2.9444389791664403 1 0  9.930184682210287 0
         74 1988  120.6078 0  120.6078 0  120.6078 41   28 0  3.367295829986474 1 0                 10 0
         75 1988  53.32238 0  53.32238 0  53.32238 44    8 0 2.1972245773362196 1 0  4.443531672159831 0
         76 1989 127.01437 0 127.01437 0 127.01437 46   62 0  4.143134726391533 1 0                 10 0
         77 1985 141.93019 0 141.93019 0 141.93019 66  247 0 5.5134287461649825 1 0                 10 0
         78 1984        80 0        80 0 130.75975 76    0 0                  0 1 0  6.666666666666667 0
         79 1986 130.69405 0 130.69405 0 130.69405 60   41 1 3.7376696182833684 1 0                 10 0
         80 1991  59.72895 0  59.72895 0  59.72895 38  151 0 5.0238805208462765 1 0  4.977412859598796 0
         81 1993  95.80287 0  95.80287 0  95.80287 61   28 0  3.367295829986474 1 0  7.983572642008464 0
         82 1990 72.443535 0 72.443535 0 72.443535 73    0 0                  0 1 0 6.0369612375895185 0
         83 1984 129.11705 0 129.11705 0 129.11705 70   70 0 4.2626798770413155 1 0                 10 0
         84 1988 131.41684 0 131.41684 0 131.41684 44  168 0 5.1298987149230735 1 0                 10 0
         85 1993  55.65503 0  55.65503 0  55.65503 78  142 1  4.962844630259907 1 0  4.637919108072917 0
         86 1988  116.3039 0  116.3039 0  116.3039 44   93 0  4.543294782270004 1 0  9.691991806030273 0
         87 1991  98.39835 0  98.39835 0  98.39835 29    0 0                  0 1 0  8.199862798055014 0
         88 1985  158.7187 0  158.7187 0  158.7187 35   70 0 4.2626798770413155 1 0                 10 0
         89 1987 114.62833 0 114.62833 0 114.62833 46   93 0  4.543294782270004 1 0  9.552361170450846 0
         90 1988 107.86037 0 107.86037 0 107.86037 44    7 0 2.0794415416798357 1 0  8.988363901774088 0
         91 1993  85.81519 0  85.81519 0  85.81519 41   84 0  4.442651256490317 1 0  7.151266098022461 0
         92 1988  109.6345 0  109.6345 0  109.6345 28  251 0  5.529429087511423 1 0  9.136208216349283 0
         93 1989 121.62628 0 121.62628 0 121.62628 30    1 0  .6931471805599453 1 0                 10 0
         94 1985  170.6119 0  170.6119 0  170.6119 62  278 0  5.631211781821365 1 0                 10 0
         95 1988   123.269 0   123.269 0   123.269 55    8 0 2.1972245773362196 1 0                 10 0
         96 1990 114.03696 0 114.03696 0 114.03696 51  227 0  5.429345628954441 1 0  9.503080368041992 0
         97 1989  115.4497 0  115.4497 0  115.4497 55  170 0   5.14166355650266 1 0  9.620807647705078 0
         98 1993  57.56057 0  57.56057 0  57.56057 72    4 1 1.6094379124341003 1 0  4.796714464823405 0
         99 1988  99.81109 0  99.81109 0  99.81109 58   29 1 3.4011973816621555 1 0  8.317590713500977 0
        100 1981  183.6879 0  183.6879 0  183.6879 44    0 0                  0 1 0                 10 0
        end
        label values mfi noyes
        label def noyes 0 "no", modify
        label values hormon adjhormo
        label def adjhormo 0 "no", modify
        label def adjhormo 1 "yes", modify
        Code:
        stset rf, f(rfi==1) scale(12) exit(time 60)
        
        sum _t //min=0.099, max=5.0
        
        rcsgen age, df(3) gen(agercs) center(60)
        
        stpm2 hormon agercs* pr_1, scale(hazard) df(4) eform
        The results are attached:
        Click image for larger version

Name:	Results_stpm2.png
Views:	1
Size:	36.8 KB
ID:	1695112



        My first question is How to interpret the results of RCS in parametric models (for example, what does it mean by agercs1-4 and _rcs1-4, should we interprent agercs1-4 and ignore _rcs1-4 or both or else?). I searched a lot but could not find a good material to provide how to interpret.

        AND

        I used the following codes to predict survivals:

        Code:
        range tt 0 5 100 //(2,882 missing values generated)
        
        predict s0, survival timevar(tt) zeros ci //predict baseline survivals
        
        predict s2, survival timevar(_t) //predict survivals
        My second question is How to estimate cumulative incidence function or risk? In non- and semi-parametric methods, we use a complementary approach to serve this: F(1) = 1 - S(t). Can we apply this way to estimate cumulative risk in parametric and flexible parametric survival models (since I could not find any paper guiding this, even the paper on stpm2 by Paul Lambert and Royston at
        HTML Code:
         https://journals.sagepub.com/doi/pdf/10.1177/1536867X0900900206
        I'm looking forward to receiving your continued help.

        Wishing You All a Happy, Successful and Prosperous New Year!
        Last edited by Huy Nguyen; 27 Dec 2022, 20:12.

        Comment


        • #5
          Thanks for the kind words Huy and thanks for providing a data example and full code. I don't have access to Stata at the moment so can't provide code, but I'll provide short answers now and can provide more details and/or code later if needed.

          The short answer to your first question is that you should not try and interpret the parameters for the spline basis vectors. You can interpret the results of hormon and pr_1 in the usual way. You can estimate and plot the hazard as a function of age (for specified values of other covariates) or choose a reference value for age and plot the HR as a function of age. See "help stpm2 postestimation".

          Regarding your second question, 1-S(t) is usually called the failure function in Stata terminology. It is not the same as the cumulative incidence function; search "survival analysis competing risks" for details.

          You can easily estimate the failure function (with 95% confidence intervals) using stpm2 postestimation commands:
          Code:
          predict F, failure ci
          To get the CIF, you can use stpm2cif (available from SSC).

          Comment


          • #6
            Thanks Paul for your continued useful advice.

            Comment

            Working...
            X