Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kaplan Meier Survival

    Hey,

    I am doing a project where i need to assess the survival of Renal Cancer patients on/off a particular drug type. i am aware i need to do a Kaplan Meier analysis and have manipulated my data into three columns: censored (alive)/uncensored (Dead), days alive since diagnosis and group (on/off the drug). i have stata and have NO idea how to make the Kaplan-Meier curve. i have never used this software before and would dearly appreciate any help. i can upload the file if needs be.

    thanks!

    simon

  • #2
    See the [st] manual, in particular stset and sts graph

    Comment


    • #3
      We definitely need to see your Stata data set to see how the data are set out. Don't upload it, though. Install the -dataex- command (-ssc install dataex-) if you don't already have it. And use that to post an example of your data.

      Comment


      • #4
        Simon,

        Christophe and Clyde give good advice. I will add the following general template:

        Code:
        stset time, failure(censor==0)
        sts graph, by(group)
        where "time" is the name of your time variable, "censor" is the name of your censoring variable (assumed to be 1 if censored, 0 if uncensored), and "group" is the name of your treatment variable. Note that if your censoring variable is reverse coded (1 for uncensored/failed, 0 for censored/survived), you can use failure(censor), as the option assumes failure when the variable is equal to 1.

        Other commands that you can use after stset include sts list (for a table of survival probabilities) and sts test (to test for differences between groups).

        Regards,
        Joe

        Comment


        • #5
          Simon:
          welcome to the list.
          As an aside to previous helpful advice, I would recommend you the following textbook: http://www.stata.com/bookstore/survi...-introduction/
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Hi and thanks for both question and answers.
            I have kind of the same problem. I am using 14.2 and have set my failure variable to Pathology yes/no (0 1)
            I have 348 patients and only 130 failures.
            When I type - sts graph, by (group) ci risktable - I can follow the survival curve to zero. And the censored patients ( the ones without Pathology==1) are missing from the survival curve.

            I have been over and over the PDF STATA help, help stset - and seen on STATA YouTube several times.

            can anyone tell me what I am doing wrong?
            I would be so grateful since I really don't know what I can do from here.

            Comment


            • #7
              Ditte, as others requested, we are trying to help you, but we cannot help you without a better description of what you are working with.

              In a post on the Mata forum (I redirected you here), you did post:

              Code:
              stset obs_time_m12_m4 if m12_FinishedinProject ==1, failure( PathologyFindings_m4_tumor )scale(365.25)
              
              failure event: PathologyFindings_m4_tumor != 0 & PathologyFindings_m4_tumor < .
              obs. time interval: (0, obs_time_m12_m4]
              exit on or before: failure
              t for analysis: time/365.25
              if exp: m12_FinishedinProject ==1
              
              
              699 total observations
              351 ignored at outset because of -if <exp>-
              
              348 observations remaining, representing
              130 failures in single-record/single-failure data
              254.294 total analysis time at risk and under observation
              at risk from t = 0
              earliest observed entry t = 0
              last observed exit t = 1.336071
              
              
              then; 
              sts graph, by (PatientProtocol_n) ci risktable
              Unfortunately, this doesn't tell us enough without knowing how you -stset- the data, or without an example of the data. Don't forget to remove all identifying information if you post an example of data.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                Ditte:
                as an aside to Weiven's helpful comment, please note that in -sts graph- you can highlight consored observations with hash marks via the -Plot censoring, entries, etc...- option.
                Kind regards,
                Carlo
                (Stata 18.0 SE)

                Comment


                • #9
                  OK, thanks.
                  I will try my best to give further details on my data since I am both a novice to STATA but also in writing in this forum. Thank you all for your patience.

                  My time variable is created from the date of the latest operation ( Patient_DateOfLatestTURB) and until patient last visit ( m12_visitDate) it is in mm/dd/yyyy
                  The reason for "351 missing values generated" is that not all of the patients have FiniishedInProject==1, as in yes 1 and ==0, no
                  Date of randomization = m4_visitDate into control group and intervention group

                  . generate time_obs_all if m12_FinishedinProject==1 = ( m12_visitDate - Patient_DateOfLatestTURB)/ 365.25
                  (351 missing values generated)

                  The patients are coming for a check up at following time = 4 months after their latest operation (called TURB) = m4_PathologyFIndings
                  8 months after = m8_PathologyFindings
                  & 12 months after = m12_PathologyFIndings
                  The findings at each visit, look like this:
                  1 tumor
                  2 normal
                  3 inflammation
                  4 other

                  I can show you:
                  . tab PatientProtocol_n m4_PathologyFindings if m12_FinishedinProject==1


                  Patient_Prot | m4_PathologyFindings
                  ocol | tumor normal inflammat other | Total
                  -------------+--------------------------------------------+----------
                  control | 68 13 13 3 | 97
                  intervention | 62 11 12 0 | 85
                  -------------+--------------------------------------------+----------
                  Total | 130 24 25 3 | 182

                  . tab PatientProtocol_n m8_PathologyFindings if m12_FinishedinProject==1

                  Patient_Prot | m8_PathologyFindings
                  ocol | tumor normal inflammat other | Total
                  -------------+--------------------------------------------+----------
                  control | 37 5 8 1 | 51
                  intervention | 30 3 3 2 | 38
                  -------------+--------------------------------------------+----------
                  Total | 67 8 11 3 | 89

                  . tab PatientProtocol_n m12_PathologyFindings if m12_FinishedinProject==1

                  Patient_Prot | m12_PathologyFindings
                  ocol | tumor normal inflammat other | Total
                  -------------+--------------------------------------------+----------
                  control | 58 5 6 1 | 70
                  intervention | 38 3 9 1 | 51
                  -------------+--------------------------------------------+----------
                  Total | 96 8 15 2 | 121


                  I would like to stset the data so I can see time to first recurrence, on a Kaplan Meier if possible.
                  . generate pathology_all = ( m4_PathologyFindings==1 | m8_PathologyFindings==1 | m12_PathologyFindings==1)

                  after that I - stset my data


                  . stset time_obs_all if m12_FinishedinProject ==1, id(Patient_ID_n) failure(pathology_all==1)

                  id: Patient_ID_n
                  failure event: pathology_all == 1
                  obs. time interval: (time_obs_all[_n-1], time_obs_all]
                  exit on or before: failure
                  if exp: m12_FinishedinProject ==1

                  ------------------------------------------------------------------------------
                  699 total observations
                  351 ignored at outset because of -if <exp>-
                  ------------------------------------------------------------------------------
                  348 observations remaining, representing
                  348 subjects
                  183 failures in single-failure-per-subject data
                  375.65 total analysis time at risk and under observation
                  at risk from t = 0
                  earliest observed entry t = 0
                  last observed exit t = 1.645448

                  . sts graph, by ( PatientProtocol_n ) ci risktable

                  failure _d: pathology_all == 1
                  analysis time _t: time_obs_all
                  id: Patient_ID_n

                  THEN
                  the graph is only presenting the ones with failure!
                  I wish I could show you, but I dont know how to attach the graph. The "upload attachment"button is not working for me.

                  hope this is helpful

                  I will be so grateful if someone can help me figuring this out.
                  Please bare with me, if the above is not enough - or too much..

                  Thanks,
                  Ditte





                  Comment


                  • #10
                    Ditte,

                    Earlier I was incorrect to say you didn't show your -stset- code. I got confused because you posted your reply on this thread ... which you are entitled to do, but it would probably be better to start a new one for clarity. Also, it helps us to read your code if you use the code delimiters, which enclose your code in a nice box like in my earlier post. Use the # button in the formatting toolbar.

                    I can't tell exactly why your -sts graph- command is showing only failures. But there are some things about your code that look like possible errors.

                    I think you're saying that patients get 3 visits at 4, 8, and 12 months after operation. You calculated the observation time, time_obs_all, as the time between the last operation and the 12-month visit date. You say that you have 348 subjects with valid data.

                    You then show cross-tabs of pathology findings at each visit. The denominators for each visit are 182, 89, and 121 patients. That's a total of 392, which implies that not many patients got more than one follow up, and that everyone's follow up time is different. Also, your -stset- output says there are 183 failures. From your crosstabs, I see 130, 67, and 96 patients had tumors found at each of the pathology visits respectively. That totals 293. That is a lot more than -stset- says. This could be true if patients were coming to multiple follow up visits, I guess. But it is a very confusing scheme of follow up. Please correct me if I have misunderstood your output.

                    Moreover, there's no indication that if someone had a tumor at, say, 4 months, you recoded their time variable to 4 months (or more precisely, the 4-month visit date minus the operation date). You would need to recode the observation time based on the earliest date where a tumor was detected for this to work. Can you give us a summary of observation time in code delimiters?

                    Code:
                    summarize time_obs_all if m12_FinishedinProject ==1, detail
                    Last, you say you coded survival time based on the date of the 12-month visit. But your tables above appear to indicate that only 1/3 of your sample got a 12-month pathology report. Does not every visit get a pathology report?

                    Your code to indicate if there was tumor pathology on any one visit is correct as far as it goes. I can't see any errors in your -stset- code, but it's been a while since I did survival analysis. If the upload attachment button on the forum isn't working, then there are free image hosting sites like www.imgur.com that will take graphs. I haven't seen anyone on the forum use these, but they are not prohibited by the FAQ.

                    Let's just focus on maybe seeing your graph and getting it to run. But, I think that the study has a lot of issues. For example, say one person's 4-month pathology report was normal, but their 8-month report had a tumor. Most properly, you know they developed a tumor between 4 and 8 months, but you don't know when. When you code the observation time based on the 8 month visit, Stata will treat their survival time as 8 months exactly. I believe this is interval censoring, and this may be more appropriate for discrete time survival.
                    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                    Comment


                    • #11
                      As you suggest, I will post a new one.

                      Comment


                      • #12
                        Hello,

                        I am having the same issue, thus my post in this thread.

                        I am working on a longitudinal dataset, with one or more observations/measures per patient. The first observation for each patient is their primary surgery (when they start being at risk); i.e, prim=1. Their subsequent observations are revisions of the primary surgery; i.e., revis==1. For some patients, I don't have the first observation (the primary surgery) as it occurred before my study period. Only about 15% patients for whom I have data for the primary surgery, had at least one revision during the study period (10 years). The remaining are censored. I would like to calculate Kaplan Meier curves for time to revision, and cumulative revision rates, by type of valve used in the primary surgery. However, when I use sts graph and stptime, censored patients are excluded. Can you advise please?

                        I am copying below an example of my dataset (including only relevant variables) a description of the variables, and the commands I use.

                        Many thanks!

                        Rocio


                        Example:

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input long id double opdati byte(opss prim revis codvalps)
                        18803 1.4871666e+12 1 1 0 1
                        19011 1696419420000 1 1 0 1
                        19419 1389796260000 1 1 0 2
                        19419 1477151940000 2 0 1 2
                        19419 1.4812161e+12 3 0 1 2
                        19419 1.5119676e+12 4 0 1 2
                        19420 1389534660000 1 1 0 1
                        19656 1.3908312e+12 1 1 0 1
                        19656 1.3966389e+12 2 0 1 1
                        19656 1.4660064e+12 3 0 1 1
                        19658 1.3909062e+12 1 1 0 1
                        19658 1392048960000 2 0 1 1
                        19659 1.3909131e+12 1 1 0 1
                        19660 1.3907577e+12 1 1 0 1
                        19668 1.3895712e+12 1 1 0 2
                        19669 1.3891014e+12 1 1 0 2
                        19670 1.3904352e+12 1 1 0 1
                        19782 1.3914234e+12 1 1 0 1
                        19788 1.3901598e+12 1 1 0 1
                        19790 1.3891794e+12 1 1 0 1
                        19808 1.3900131e+12 1 1 0 1
                        19810 1.3891239e+12 1 1 0 1
                        19811 1389956460000 1 1 0 1
                        end
                        format %tcDDmonCCYY_HH:MM opdati
                        label values revis yesnola
                        label values prim yesnola
                        label def yesnola 0 "No", modify
                        label def yesnola 1 "Yes", modify
                        label values codvalps codvalla
                        label def codvalla 1 "Hakim Precision", modify
                        label def codvalla 2 "Hakim Programmable", modify


                        Variables description:

                        variable name type format label variable label
                        -----------------------------------------------------------------------------------------------------------------
                        id long %8.0g Patient id
                        opdati double %tc.. * Date and starting time of operation
                        prim byte %9.0g yesnola Primary shunt
                        revis byte %9.0g yesnola Revision operation (excl.prim.EVDs)
                        codvalps byte %19.0g codvalla * Codman valve subtype on primary shunt
                        -----------------------------------------------------------------------------------------------------------------



                        Commands:

                        stset opdati, id(id) en(opss==1) origin(prim==1) failure(revis==1) scale(2635200000) exit(time .)
                        sts graph, failure by(codvalps)
                        stptime, by(codvalps)





                        Comment


                        • #13
                          Originally posted by Rocio Fernandez Mendez View Post
                          ...
                          I am working on a longitudinal dataset, with one or more observations/measures per patient. The first observation for each patient is their primary surgery (when they start being at risk); i.e, prim=1. Their subsequent observations are revisions of the primary surgery; i.e., revis==1. ...
                          Dealing with the above first. You have multiple failure dataset, i.e. patients can fail repeatedly (i.e. have repeated revisions of the primary surgery). You specified the time patients first entered the study. You did not specify the time that patients exit the study. For example, your first patient (ID 18803) entered the study on 15 Feb, 2007. She never had any revisions. That would be fine, except Stata does not know when she became right censored. As such, Stata is assuming she exited the study on the same day she entered. Hence,

                          Code:
                          stset opdati, id(id) origin(prim == 1) failure(revis==1) scale(2635200000) exit(time .)
                          
                                          id:  id
                               failure event:  revis == 1
                          obs. time interval:  (opdati[_n-1], opdati]
                           exit on or before:  time .
                              t for analysis:  (time-origin)/2.64e+09
                                      origin:  prim==1
                          
                          ------------------------------------------------------------------------------
                                   23  total observations
                                   17  observations end on or before enter()
                          ------------------------------------------------------------------------------
                                    6  observations remaining, representing
                                    3  subjects
                                    6  failures in multiple-failure-per-subject data
                               75.185  total analysis time at risk and under observation
                                                                          at risk from t =         0
                                                               earliest observed entry t =         0
                                                                    last observed exit t =  46.27702
                          She is one of the people in bold. In the data you entered, your study only includes those people who had multiple failures. That's why your graph looks odd. You need to specify a censoring time. And that time depends on your study design, so we can't help you there. For example, you may need to include the last record you have of that patient, complete with the time of record. Or I think you can just set the exit date as the last day of the study, if it's fixed for all people, or just make it 10 years from entry date (going off your description below).

                          Originally posted by Rocio Fernandez Mendez View Post
                          For some patients, I don't have the first observation (the primary surgery) as it occurred before my study period. Only about 15% patients for whom I have data for the primary surgery, had at least one revision during the study period (10 years). The remaining are censored. I would like to calculate Kaplan Meier curves for time to revision, and cumulative revision rates, by type of valve used in the primary surgery. However, when I use sts graph and stptime, censored patients are excluded. Can you advise please?
                          ...
                          Your example data don't appear to have any instances of the patients in bold. In principle, that could be a case of left truncation, where subjects enter the study already having been at risk for an unknown period of time. I believe (but I am not certain) that Stata will naturally handle the left truncation bit via the entry time mechanism, with the caveat that if a subject enters the study on the same day they have a revision, then they get entirely excluded from the risk set, which I don't think is good.
                          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                          Comment


                          • #14
                            Many thanks for your answer.

                            I had tried, and I just tried again, putting other "exit" conditions but they didn't work either. Specifically, I have just tried the following two options, and I still have the same issue: patients who did not need any revision are discarded by Stata...

                            Option 1:

                            Code:
                            stset opdati, id(id) en(opss==1) origin(prim==1) failure(revis)  scale(2635200000) exit(time tc(31,12,2013,23:59:59))
                            
                                            id:  id
                                 failure event:  revis != 0 & revis < .
                            obs. time interval:  (opdati[_n-1], opdati]
                             enter on or after:  opss==1
                             exit on or before:  time tc(31,12,2013,23:59:59)
                                t for analysis:  (time-origin)/2.64e+09
                                        origin:  prim==1
                            
                            ------------------------------------------------------------------------------
                                  41033  total observations
                                  10688  ignored because never entered
                                  20950  observations end on or before enter()
                            ------------------------------------------------------------------------------
                                   9395  observations remaining, representing
                                   5047  subjects
                                   9395  failures in multiple-failure-per-subject data
                              68413.205  total analysis time at risk and under observation
                                                                            at risk from t =         0
                                                                 earliest observed entry t =         0
                                                                      last observed exit t =  114.5495

                            Option 2:

                            Code:
                            gen end = tc(31,12,2013,23:59:59)
                            format end %tcDDmonCCYY_HH:MM
                            stset opdati, id(id) en(opss==1) origin(prim==1) failure(revis)  scale(2635200000) exit(end)
                            
                            
                                            id:  id
                                 failure event:  revis != 0 & revis < .
                            obs. time interval:  (opdati[_n-1], opdati]
                             enter on or after:  opss==1
                             exit on or before:  time end
                                t for analysis:  (time-origin)/2.64e+09
                                        origin:  prim==1
                            
                            ------------------------------------------------------------------------------
                                  41033  total observations
                                  10688  ignored because never entered
                                  20950  observations end on or before enter()
                            ------------------------------------------------------------------------------
                                   9395  observations remaining, representing
                                   5047  subjects
                                   9395  failures in multiple-failure-per-subject data
                              68413.205  total analysis time at risk and under observation
                                                                            at risk from t =         0
                                                                 earliest observed entry t =         0
                                                                      last observed exit t =  114.5495

                            Any advice? Many thanks!




                            Comment

                            Working...
                            X