Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kaplan Meier - why is only failures are shown?

    Dear STATA forum
    I am having a great struggle with my STATA data, primarily is the Kaplan Meier curve only showing the failures.( See the attached )

    I will try my best to give further details on my data since I am both a novice to STATA but also in writing in this forum. Thank you all for your patience.
    These 349 patients had an operation 4 months ago, called TURB (Transurethral Resection of the Bladder, for those who are interested) Due to the high recurrence rate in this disease, the patients are ALL rutinely booked for a cystoscopy 4 months after TURB.
    I have randomized the patients 1:1 into a control arm and an intervention arm. (PatientProtocol_n)
    So the date of randomization is m4_visitDate

    I feel that I wlll have to clarify the surveillance program a bit more detailed:
    The follow up time is not the same for each patient. -unfortunately, so,
    Because of that I have a variable dummy_m8_visitDate (1=yes, 0=no)

    Each patient can have 0 - 3 tumores during time of follow up

    ( ALL ) m4_visitDate:& m4_PathologyFindings 1=tumor 2=normal 3=inflammation 4= other and . =no tumor found
    ( Only) if m4_visitDate==1, there is a need of.an other surveillance at:
    m8_visitDate & m8_PathologyFindings 1=tumor 2=normal 3=inflammation 4=other and . = no tumor found
    ALL patients are coming for their
    m12_visitDate & m12_PathologyFindings ( 1, 2 3 4 and . as above )

    I hope that it is somewhat clear and the following are shown in the nice CODE deliminators.


    *count patients in the two groups ( 176 control, 173 intervention)
    Code:
    tab PatientProtocol_n
    *see if all patients in this dataset has at least two visits (m4_VisitDate and m12_VisitDate)
    Code:
    br PatientProtocol_n m4_visitDate m12_visitDate
    * generate observationtime in years from first visit (m4_VisitDate) and last visit (m12_VisitDate)
    Code:
    generate futime_yrs = ( m12_visitDate - m4_visitDate )/365.25
    label variable futime_yrs "m12 - m4 follow up time"
    * generate observationtime in years from patient´s latest TURB and last visit (m12_VisitDate - Patient_DateOfLatestTURB)
    * in case of missing m8_VisitDate and no pathology found in m4_VisitDate
    Code:
    generate futime_yrs_notumorm4_nom8visit if dummy_visit_m8==0 = ( m12_visitDate - Patient_DateOfLatestTURB )/365.25
    label variable futime_yrs_notumorm4_nom8visit "m12 - Patient_DateOfLatestTURB"
    * generating observationtime in years from patient´s m8_visitDate==1 and Patient_DateOfLatestTURB when no tumor m4_visitDate
    Code:
    generate futime_yrs_notumorm4_m8visit if dummy_visit_m8==1 = ( m8_visitDate - Patient_DateOfLatestTURB  )/365.25
    label variable  futime_yrs_notumorm4_m8visit "m8_visitDate - Patient_DateOfLatestTURB if no tumor m4_visit (highrisk)"
    * Summarize and details
    Code:
    summarize futime_yrs, detail
    Code:
    /*                 m12 - m4 follow up time
    -------------------------------------------------------------
          Percentiles      Smallest
     1%     .4900753       .1916496
     5%     .5859001       .3997262
    10%     .6132786       .4572211       Obs                 349
    25%     .6516085       .4900753       Sum of Wgt.         349
    
    50%     .6872005                      Mean           .7302687
                            Largest       Std. Dev.       .136229
    75%     .7638603       1.147159
    90%     .9418207       1.182752       Variance       .0185583
    95%     1.026694       1.204654       Skewness       1.195845
    99%     1.147159       1.336071       Kurtosis       5.698118*/
    
    summarize futime_yrs_notumorm4_nom8visit
    
    /*    Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    fut~om8visit |        156    1.012671    .0900095   .5338809   1.275838
    */
    summarize futime_yrs_notumorm4_nom8visit, detail
    
     /*              m12 - Patient_DateOfLatestTURB
    -------------------------------------------------------------
          Percentiles      Smallest
     1%     .6543463       .5338809
     5%      .881588       .6543463
    10%     .9336071       .7310061       Obs                 156
    25%     .9801506       .8049281       Sum of Wgt.         156
    
    50%     1.008898                      Mean           1.012671
                            Largest       Std. Dev.      .0900095
    75%      1.05681       1.199179
    90%     1.111567       1.218344       Variance       .0081017
    95%     1.149897       1.240246       Skewness      -1.196008
    99%     1.240246       1.275838       Kurtosis       9.365792
    */
    summarize futime_yrs_notumorm4_m8visit , detail
    /*
            m8_visitDate - Patient_DateOfLatestTURB if no
                      tumor m4_visit (highrisk)
    -------------------------------------------------------------
          Percentiles      Smallest
     1%     .4900753       .3175907
     5%     .6132786       .4900753
    10%     .6351814       .5174538       Obs                 193
    25%     .6735113       .5612594       Sum of Wgt.         193
    
    50%     .7118412                      Mean           .7229345
                            Largest       Std. Dev.      .0980745
    75%     .7611225       1.010267
    90%     .8104038       1.062286       Variance       .0096186
    95%     .8788501       1.106092       Skewness       1.866754
    99%     1.106092       1.385352       Kurtosis       15.67453
    */

    * checking pathology findings at each visit. If no pathology is found =.
    Code:
    tab PatientProtocol_n m4_PathologyFindings, miss
    /*
    Patient_Prot |                  m4_PathologyFindings
            ocol |     tumor     normal  inflammat      other          . |     Total
    -------------+-------------------------------------------------------+----------
        control  |        68         14         13          3         78 |       176 
    intervention |        62         11         12          0         88 |       173 
    -------------+-------------------------------------------------------+----------
           Total |       130         25         25          3        166 |       349 
    */
    tab PatientProtocol_n m8_PathologyFindings if dummy_visit_m8==0, missing
    /*
                 | m8_Patholo
    Patient_Prot | gyFindings
            ocol |         . |     Total
    -------------+-----------+----------
        control  |        75 |        75 
    intervention |        81 |        81 
    -------------+-----------+----------
           Total |       156 |       156 
    */
    tab PatientProtocol_n m8_PathologyFindings if dummy_visit_m8==1, missing
    /*
    Patient_Prot |                  m8_PathologyFindings
            ocol |     tumor     normal  inflammat      other          . |     Total
    -------------+-------------------------------------------------------+----------
        control  |        37          5          8          1         50 |       101 
    intervention |        30          3          3          2         54 |        92 
    -------------+-------------------------------------------------------+----------
           Total |        67          8         11          3        104 |       193 
    */
    
    ***OBS : 193 + 156 = 349 patients!
    Code:
    tab PatientProtocol_n m12_PathologyFindings, missing
    
    /*Patient_Prot |                 m12_PathologyFindings
            ocol |     tumor     normal  inflammat      other          . |     Total
    -------------+-------------------------------------------------------+----------
        control  |        58          5          6          1        106 |       176 
    intervention |        38          3          9          1        122 |       173 
    -------------+-------------------------------------------------------+----------
           Total |        96          8         15          2        228 |       349 
    */
    * generate new variable if there is a tumor at all. 1 = yes 0= no
    Code:
    generate pathology_all = ( m4_PathologyFindings==1 | m8_PathologyFindings==1 | m12_PathologyFindings==1)
    label variable pathology_all "if there is a tumor at m4_PathologyFindings | m8_PathologyFindings | m12_PathologyFindings"
    /* stset the data*/

    Code:
    stset futime_yrs, id(Patient_ID_n) failure(pathology_all==1)
    /*
                    id:  Patient_ID_n
         failure event:  pathology_all == 1
    obs. time interval:  (_t0, futime_yrs]
     exit on or before:  failure
    
    ------------------------------------------------------------------------------
            349  total observations
              0  exclusions
    ------------------------------------------------------------------------------
            349  observations remaining, representing
            349  subjects
            183  failures in single-failure-per-subject data
        254.864  total analysis time at risk and under observation
                                                    at risk from t =         0
                                         earliest observed entry t =         0
                                              last observed exit t =  1.336071 
    */
    *Kaplan meier curve with ci interval and risk table
    Code:
    sts graph, by ( PatientProtocol_n ) ci risktable
    /*
             failure _d:  pathology_all == 1
       analysis time _t:  futime_yrs
                     id:  Patient_ID_n
    And then see the attached file.
    As you can see, it is only the the failures that are shown on the curve. Also the first long plateau on the curve is bothering me.

    I am so very grateful for this oppotunity - and I am sorry, it this is way to much information.

    Thanks,
    Ditte

    PS: I am hoping to do some other Kaplan Meier curves on the different visits and PathologyFindings
    Attached Files

  • #2
    Oh no - I´ve just read, that stata is not in capitals and I do apologize for the weird headline, that is not really a sentence or a question, as it is supposed to be.
    I can change that because it is too late.

    Hopefully, no one gets too annoyed to answer my thread.
    Thanks,
    Ditte

    Comment


    • #3
      Unless I am misinterpreting the KM graph, I don't see an issue with it. Could you further explain what you think the issue is?

      It appears that everyone had the event or was censored before the 1.5 years shown on the graph. For more insights into the failures and censoring you could run the following code:
      Code:
      sts list , by(PatientProtocol_n) enter
      Furthermore, to visualize the censoring that also contributes to the KM function, you can add censoring indicators to the plot:
      Code:
      sts graph, by(PatientProtocol_n) censored(number) risktable
      Last edited by Matt Warkentin; 13 Jan 2018, 17:03.

      Comment


      • #4
        Ditte,

        Matt Warkentin gave good advice on how you can see which patients got censored (as opposed to developed a tumor). Correct me if I'm wrong, but I think you are expressing concern that your K-M curve is going all the way down to 0% of patients surviving. That is not necessarily wrong. Your KM curve goes down when someone has a tumor OR when they are otherwise censored (e.g. died, lost to follow up, abducted by aliens). The assumption is that if someone were abducted, you know that they got the event either on or after the date of abduction. Your Cox model treats this censoring as non-informative, and it focuses on the actual deaths or cancer recurrences. In your case, I know that you only have recorded cancer recurrences for about half the study population, but the other half were censored - or Stata thinks they were censored. If this is not what you meant, you need to tell us.

        You said,
        Also the first long plateau on the curve is bothering me.
        This is an artifact of your data entry. Please correct me if I'm wrong, but it looks like this is what you did:

        1) Patients get an initial cancer resection, then come in for a 4-month follow-up visit. At this visit, they are randomized into study arms.

        2) Some patients have a follow-up visit at about 8 months.

        3) All patients got a follow-up at approximately 12 months.

        4) At all follow-ups, they have a pathology assessment, and you assess if there is any (new?) cancer detected.

        5) futime_yrs = the month 12 visit date - month 4 visit date.

        First, the plateau is because the earliest you could have detected new cancer is the month 8 visit. It is the first time you checked after m4. It is not that the patients could not have developed a recurrence, it is that there is no way you would know until the 8-month visit.

        Second, there is a different problem with step #5. You -stset- your code with the variable futime_yrs as the survival time. You have two different variables for follow-up time for the people with and without an 8-month visit - futime_yrs_notumorm4_m8visit and futime_yrs_notumorm4_nom8visit. But unless there's code you haven't shown us, you didn't alter the main variable to account for those two other variables.

        In my earlier post, I alluded to the first issue above being a problem. Because you are only able to detect a recurrence at the time of the first follow-up visit, you have interval-censored data. Your protocol artificially protects anyone who did not have an 8-month visit - if they had a recurrence, the earliest you could have detected it was at the 12-month visit. Also, your variable names for the survival times above appear to allude to patients not having any tumor at the 4-month visit, but you clearly show that some patients had tumors at month 4. If your 8- and 12-month findings are describing any new tumors, then this may call for more like a frailty model.

        Comment


        • #5
          Thank you, Mark and Weiwen for your useful comments. Of course, if all patients have a failure during the follow-up time, there would no problem with the KM curve. But as Weiwen wrote,
          I know that you only have recorded cancer recurrences for about half the study population, but the other half were censored - or Stata thinks they were censored.
          This is exactly right. Hopefully, none of my patients are
          abducted by aliens
          At least not during the trial period, anyway.

          So my question is still: How can I get the patients WITHOUT failures to be included in the KM curve? They exist as you can see on the next CODE delimiter (163 patients ) - but just not on the curve... see the attached KM curve in my first post

          Code:
          . tab PatientProtocol_n pathology_all, miss
          
                       |  if there is a tumor
                       |          at
                       | m4_PathologyFindings
                       |           |
                       | m8_PathologyFindings
          Patient_Prot |     | m12_Patholo
                  ocol |         0          1 |     Total
          -------------+----------------------+----------
              control  |        77         99 |       176 
          intervention |        89         84 |       173 
          -------------+----------------------+----------
                 Total |       166        183 |       349
          Weiwen: This is correct :
          Please correct me if I'm wrong, but it looks like this is what you did:

          1) Patients get an initial cancer resection, then come in for a 4-month follow-up visit. At this visit, they are randomized into study arms.

          2) Some patients have a follow-up visit at about 8 months.

          3) All patients got a follow-up at approximately 12 months.

          4) At all follow-ups, they have a pathology assessment, and you assess if there is any (new?) cancer detected.

          5) futime_yrs = the month 12 visit date - month 4 visit date.
          Also right about
          First, the plateau is because the earliest you could have detected new cancer is the month 8 visit. It is the first time you checked after m4. It is not that the patients could not have developed a recurrence, it is that there is no way you would know until the 8-month visit.
          I will need to read more about interval censoring - that´s sound as this is something I would have to consider. Also the frailty model.

          Second, there is a different problem with step #5. You - stset - your code with the variable futime_yrs as the survival time. You have two different variables for follow-up time for the people with and without an 8-month visit - futime_yrs_notumorm4_m8visit and futime_yrs_notumorm4_nom8visit. But unless there's code you haven't shown us, you didn't alter the main variable to account for those two other variables.
          Correct, I did not show the changing of the variables, because the problem regarding no-failure patients shown in the KM curve was the same whatever variables I used.

          I will have a look at interval censoring and frailty models.

          Thank you both very much for taking the time to help me -.It is highly appreciated.
          I will be looking forward to hearing your thoughts on the KM curve.

          Ditte

          Comment


          • #6
            Or did I misunderstand, when you answered the question regarding the censored/ abducted?
            Code:
            . 
            . sts graph, by ( PatientProtocol_n ) censored (number) risktable
            
                     failure _d:  pathology_all == 1
               analysis time _t:  futime_yrs
                             id:  Patient_ID_n
            How can I get the ones without tumor recurrencies / failure==0 on the same KM curve than the ones with failures( pathology_all==1) The 166 patients?
            Code:
            . tab PatientProtocol_n pathology_all, miss
            
                         |  if there is a tumor
                         |          at
                         | m4_PathologyFindings
                         |           |
                         | m8_PathologyFindings
            Patient_Prot |     | m12_Patholo
                    ocol |         0          1 |     Total
            -------------+----------------------+----------
                control  |        77         99 |       176 
            intervention |        89         84 |       173 
            -------------+----------------------+----------
                   Total |       166        183 |       349
            Attached Files

            Comment


            • #7
              First off, I made a blunder earlier: I said that the KM curve goes down when someone is censored. This is wrong. The KM curve goes down when someone has a failure event. It does not go down when someone is censored - but the risk set does become smaller. My apologies.

              Assuming your graph is showing losses due to censoring (using the -censored(number)- option for -sts graph-), you had many people censored in both arms from about 0.5 to 0.75 years. The risk table is showing that only 10 people in the control and 14 in the intervention arm were at risk at 1 year. That combines losses due to censoring and to recurrence. The people who did not have a recurrence are in the graph - specifically, they contribute to the risk set for as long as they are not censored. Both curves happen to go all the way down to zero or nearly zero because all patients in them died or got censored. You do not have to have all patients die to get down to 0% surviving - just all patients in the risk set. For example, here is one of Stata's example datasets.

              Code:
              webuse drug2b
              sts graph, by(drug) risktable censored(n)
              The KM curve for the treatment arm goes down every time someone dies. It doesn't always go down at a censoring event. Let's alter the data such that everyone in the treatment arm was censored, i.e. didn't die:

              Code:
              replace died = 0 if drug == 1
              streset /*This resets the -stset- data*/
              sts graph, by(drug) risktable censored(n)
              Demonstrating what I said earlier, people who are censored stop contributing to the risk set. In this case, the graph stays straight. Let's make the last person in the treatment group die, instead of being censored:

              Code:
              replace died = 1 if drug == 1 & studytime == 39
              streset
              sts graph, by(drug) risktable censored(n)
              See how the KM curve stays flat, despite all the censoring (that's because nobody actually died, they just got censored), and then at the very end the one death, representing 100% of the risk set, causes the KM curve to dive off a cliff? Again, the people who got censored are 'in' the KM curve, but they only contribute data for as long as they are under observation. Your curves go down to zero because your risk sets got small enough that the few deaths at the end drove them down to zero or nearly so. If you had decided earlier that the study terminated at 12 months after the initial resection, no matter what (i.e. administrative censoring), then the curves would not go down to zero, because you didn't observe the groups long enough for that to happen.

              Last, this may be worth doing to help you visualize what's going on behind the scenes. I believe this code will work on your data, given the variable names you gave:

              Code:
              sort PatientProtocol_n _t
              Note the space between _t and your treatment variable. _t means study time. Before it, there will be a variable _d, which is 1 if failure, 0 if censored - it's generated by Stata. If you scroll through the data in the data browser, you can see that for the treatmentgroup, your last event was a censoring event and not a death. Your second-last event was a death. For the control group, your second last event was a censoring event, and then the last few events were deaths (can't tell how many but looks like 2). This is just to prove that anybody not censored and not failed is still contributing data to your KM curve.

              Comment


              • #8
                Thanks again, Weiwen for your quick response.
                I ran the -webuse drug2b and can understand what you mean regarding the risk set - and that that the ones without tumor recurrence are actually in the KM curve.

                But,
                If you had decided earlier that the study terminated at 12 months after the initial resection, no matter what (i.e. administrative censoring), then the curves would not go down to zero, because you didn't observe the groups long enough for that to happen.
                Foregive me, If I haven´t explained this properly, but this is exactly what I meant. The study or follow up time ends at approx. 12 months after the initial resection.

                I did the:
                Code:
                . sort PatientProtocol_n _t
                
                . replace pathology_all = 1 if PatientProtocol_n == 1 & futime_yrs == 349
                (0 real changes made)
                but nothing happened.,
                The last PatientProtocol_n==1 (control) is no. 176 in the dataset and has pathology_all==1
                The last PatientProtocol_n==2 (intervention) is no 349 in the dataset and has pathology_all==0

                So just to clarify;- and I do apologize for my lack of understanding: Is there no way, I can change the KM curve so that the censored (pathology_all==0) do not contribute to the risk set?

                Comment


                • #9
                  Sorry for coming late to the party.

                  Foregive me, If I haven´t explained this properly, but this is exactly what I meant. The study or follow up time ends at approx. 12 months after the initial resection.
                  The graph you are getting suggests to me that you have missing values for the survival time variable (the one mentioned in the -stset- command) for those patients who did not have a recurrence before the end of the study period at 12 months of follow-up. Observations with missing survival time are not treated as censored by Stata's survival analysis commands: they are treated as not in universe.

                  To get the graphs to show the proportion of patients still recurrence-free at the end of surveillance, their survival time must be set to the duration that they were actually followed up, and the -failure()- option variable set to a censoring value. I think if you do that, you will see the kind of graphs you were expecting.

                  Comment


                  • #10
                    Hi and welcome Clyde. Better late than never - your advice is highly appreciated too.
                    Unfortunately;
                    Code:
                    tabulate futime_yrs, miss
                    did not show any missing values.

                    This was the -stset-command, you were referring to, right?
                    Code:
                    stset futime_yrs, id(Patient_ID_n) failure(pathology_all==1)
                    /*
                                    id:  Patient_ID_n
                         failure event:  pathology_all == 1
                    obs. time interval:  (_t0, futime_yrs]
                     exit on or before:  failure
                    
                    ------------------------------------------------------------------------------
                            349  total observations
                              0  exclusions
                    ------------------------------------------------------------------------------
                            349  observations remaining, representing
                            349  subjects
                            183  failures in single-failure-per-subject data
                        254.864  total analysis time at risk and under observation
                                                                    at risk from t =         0
                                                         earliest observed entry t =         0
                                                              last observed exit t =  1.336071
                    The advice is still very useful, since I would have missing values on both of the following variables.
                    Code:
                    tab futime_yrs_notumorm4_nom8visit, miss
                    and
                    Code:
                    tab futime_yrs_notumorm4_m8visit, miss

                    Thanks again.

                    Comment


                    • #11
                      Originally posted by Ditte Drejer View Post
                      Thanks again, Weiwen for your quick response.
                      I ran the -webuse drug2b and can understand what you mean regarding the risk set - and that that the ones without tumor recurrence are actually in the KM curve.

                      But,

                      Foregive me, If I haven´t explained this properly, but this is exactly what I meant. The study or follow up time ends at approx. 12 months after the initial resection.

                      I did the:
                      Code:
                      . sort PatientProtocol_n _t
                      
                      . replace pathology_all = 1 if PatientProtocol_n == 1 & futime_yrs == 349
                      (0 real changes made)
                      but nothing happened.,
                      The last PatientProtocol_n==1 (control) is no. 176 in the dataset and has pathology_all==1
                      The last PatientProtocol_n==2 (intervention) is no 349 in the dataset and has pathology_all==0

                      So just to clarify;- and I do apologize for my lack of understanding: Is there no way, I can change the KM curve so that the censored (pathology_all==0) do not contribute to the risk set?
                      Hmm.

                      If you had decided that the study ended at 12 months after the resection, then I believe you should save the current dataset, then change all individuals with study times > 1 year to be censored, then change their study time to 1 year exactly. Then re-save the data under a different name. That said, you do have a few people with information past 1 year. Why not keep them if you were still following them? For scheduling reasons, it can't be that everyone came in at exactly 8 and exactly 12 months after the initial resection, or else there would be no variation in the follow-up times at all.

                      I may have misread the blue and red lines. Looks like blue is control. You said the last control patient had a recurrence. You see how the blue line ticks down to zero at the very end, right when the last patient failed? That's expected. The red line is treatment, and you see how the red line terminates just barely above zero? Like you said, the last patient was censored. Your graph is as I expect. Also, I didn't mean for you to change the data in your own study; I meant to demonstrate how people who are censored or who fail contribute to the risk set.

                      Last, you cannot do survival analysis without counting the censored individuals in the risk set. It is a fundamental tenet of survival analysis. Perhaps something is getting lost in translation, but I am a bit confused what you are trying to do. We can see that about half the patients in the study have detected recurrences. Your KM curves go all the way down to zero or nearly zero survivors. I think I demonstrated that you can have the KM curves go all the way down to zero even if only half the patients fail. What matters is what proportion of the risk set fails, and your risk set is shrinking due to censoring - this is why, when I modified the Stata dataset, I can produce a KM curve that goes from 100% surviving to 0 survivors despite having only one failure event. Your KM curves look plausible given how you described the study. Removing the people who were censored will not change this. It would also be a terrible source of bias.

                      You already risk bias from non-informative censoring as assumed by most survival models. For example, say that some of the people you lost to follow up had died. The regular Cox model treats all cases of censoring as non-informative - any people getting censored are getting censored for the same reason. If the treatment had no effect on mortality, this is OK, but you don't know that. Imagine that the treatment reduces mortality - then you are unfairly penalizing yourself. Or the reverse. Most properly, you handle this using a competing risk model. In your case, your patient population may be young and healthy enough that we can assume that the competing risk of mortality isn't going to bias the study too much, but this is something to sort out off the forum.

                      Comment


                      • #12
                        Thank you, Weiwen - and thank you all for responding to my questions and giving me both feedback and useful advice.
                        I have really enjoyed your input.
                        Most likely, you will hear from me again asking for another advice on stata issues.
                        Best wishes,
                        Ditte

                        Comment

                        Working...
                        X