No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Kaplan Meier Survival


    I am doing a project where i need to assess the survival of Renal Cancer patients on/off a particular drug type. i am aware i need to do a Kaplan Meier analysis and have manipulated my data into three columns: censored (alive)/uncensored (Dead), days alive since diagnosis and group (on/off the drug). i have stata and have NO idea how to make the Kaplan-Meier curve. i have never used this software before and would dearly appreciate any help. i can upload the file if needs be.



  • #2
    See the [st] manual, in particular stset and sts graph


    • #3
      We definitely need to see your Stata data set to see how the data are set out. Don't upload it, though. Install the -dataex- command (-ssc install dataex-) if you don't already have it. And use that to post an example of your data.


      • #4

        Christophe and Clyde give good advice. I will add the following general template:

        stset time, failure(censor==0)
        sts graph, by(group)
        where "time" is the name of your time variable, "censor" is the name of your censoring variable (assumed to be 1 if censored, 0 if uncensored), and "group" is the name of your treatment variable. Note that if your censoring variable is reverse coded (1 for uncensored/failed, 0 for censored/survived), you can use failure(censor), as the option assumes failure when the variable is equal to 1.

        Other commands that you can use after stset include sts list (for a table of survival probabilities) and sts test (to test for differences between groups).



        • #5
          welcome to the list.
          As an aside to previous helpful advice, I would recommend you the following textbook:
          Kind regards,
          (Stata 15.1 SE)


          • #6
            Hi and thanks for both question and answers.
            I have kind of the same problem. I am using 14.2 and have set my failure variable to Pathology yes/no (0 1)
            I have 348 patients and only 130 failures.
            When I type - sts graph, by (group) ci risktable - I can follow the survival curve to zero. And the censored patients ( the ones without Pathology==1) are missing from the survival curve.

            I have been over and over the PDF STATA help, help stset - and seen on STATA YouTube several times.

            can anyone tell me what I am doing wrong?
            I would be so grateful since I really don't know what I can do from here.


            • #7
              Ditte, as others requested, we are trying to help you, but we cannot help you without a better description of what you are working with.

              In a post on the Mata forum (I redirected you here), you did post:

              stset obs_time_m12_m4 if m12_FinishedinProject ==1, failure( PathologyFindings_m4_tumor )scale(365.25)
              failure event: PathologyFindings_m4_tumor != 0 & PathologyFindings_m4_tumor < .
              obs. time interval: (0, obs_time_m12_m4]
              exit on or before: failure
              t for analysis: time/365.25
              if exp: m12_FinishedinProject ==1
              699 total observations
              351 ignored at outset because of -if <exp>-
              348 observations remaining, representing
              130 failures in single-record/single-failure data
              254.294 total analysis time at risk and under observation
              at risk from t = 0
              earliest observed entry t = 0
              last observed exit t = 1.336071
              sts graph, by (PatientProtocol_n) ci risktable
              Unfortunately, this doesn't tell us enough without knowing how you -stset- the data, or without an example of the data. Don't forget to remove all identifying information if you post an example of data.


              • #8
                as an aside to Weiven's helpful comment, please note that in -sts graph- you can highlight consored observations with hash marks via the -Plot censoring, entries, etc...- option.
                Kind regards,
                (Stata 15.1 SE)


                • #9
                  OK, thanks.
                  I will try my best to give further details on my data since I am both a novice to STATA but also in writing in this forum. Thank you all for your patience.

                  My time variable is created from the date of the latest operation ( Patient_DateOfLatestTURB) and until patient last visit ( m12_visitDate) it is in mm/dd/yyyy
                  The reason for "351 missing values generated" is that not all of the patients have FiniishedInProject==1, as in yes 1 and ==0, no
                  Date of randomization = m4_visitDate into control group and intervention group

                  . generate time_obs_all if m12_FinishedinProject==1 = ( m12_visitDate - Patient_DateOfLatestTURB)/ 365.25
                  (351 missing values generated)

                  The patients are coming for a check up at following time = 4 months after their latest operation (called TURB) = m4_PathologyFIndings
                  8 months after = m8_PathologyFindings
                  & 12 months after = m12_PathologyFIndings
                  The findings at each visit, look like this:
                  1 tumor
                  2 normal
                  3 inflammation
                  4 other

                  I can show you:
                  . tab PatientProtocol_n m4_PathologyFindings if m12_FinishedinProject==1

                  Patient_Prot | m4_PathologyFindings
                  ocol | tumor normal inflammat other | Total
                  control | 68 13 13 3 | 97
                  intervention | 62 11 12 0 | 85
                  Total | 130 24 25 3 | 182

                  . tab PatientProtocol_n m8_PathologyFindings if m12_FinishedinProject==1

                  Patient_Prot | m8_PathologyFindings
                  ocol | tumor normal inflammat other | Total
                  control | 37 5 8 1 | 51
                  intervention | 30 3 3 2 | 38
                  Total | 67 8 11 3 | 89

                  . tab PatientProtocol_n m12_PathologyFindings if m12_FinishedinProject==1

                  Patient_Prot | m12_PathologyFindings
                  ocol | tumor normal inflammat other | Total
                  control | 58 5 6 1 | 70
                  intervention | 38 3 9 1 | 51
                  Total | 96 8 15 2 | 121

                  I would like to stset the data so I can see time to first recurrence, on a Kaplan Meier if possible.
                  . generate pathology_all = ( m4_PathologyFindings==1 | m8_PathologyFindings==1 | m12_PathologyFindings==1)

                  after that I - stset my data

                  . stset time_obs_all if m12_FinishedinProject ==1, id(Patient_ID_n) failure(pathology_all==1)

                  id: Patient_ID_n
                  failure event: pathology_all == 1
                  obs. time interval: (time_obs_all[_n-1], time_obs_all]
                  exit on or before: failure
                  if exp: m12_FinishedinProject ==1

                  699 total observations
                  351 ignored at outset because of -if <exp>-
                  348 observations remaining, representing
                  348 subjects
                  183 failures in single-failure-per-subject data
                  375.65 total analysis time at risk and under observation
                  at risk from t = 0
                  earliest observed entry t = 0
                  last observed exit t = 1.645448

                  . sts graph, by ( PatientProtocol_n ) ci risktable

                  failure _d: pathology_all == 1
                  analysis time _t: time_obs_all
                  id: Patient_ID_n

                  the graph is only presenting the ones with failure!
                  I wish I could show you, but I dont know how to attach the graph. The "upload attachment"button is not working for me.

                  hope this is helpful

                  I will be so grateful if someone can help me figuring this out.
                  Please bare with me, if the above is not enough - or too much..



                  • #10

                    Earlier I was incorrect to say you didn't show your -stset- code. I got confused because you posted your reply on this thread ... which you are entitled to do, but it would probably be better to start a new one for clarity. Also, it helps us to read your code if you use the code delimiters, which enclose your code in a nice box like in my earlier post. Use the # button in the formatting toolbar.

                    I can't tell exactly why your -sts graph- command is showing only failures. But there are some things about your code that look like possible errors.

                    I think you're saying that patients get 3 visits at 4, 8, and 12 months after operation. You calculated the observation time, time_obs_all, as the time between the last operation and the 12-month visit date. You say that you have 348 subjects with valid data.

                    You then show cross-tabs of pathology findings at each visit. The denominators for each visit are 182, 89, and 121 patients. That's a total of 392, which implies that not many patients got more than one follow up, and that everyone's follow up time is different. Also, your -stset- output says there are 183 failures. From your crosstabs, I see 130, 67, and 96 patients had tumors found at each of the pathology visits respectively. That totals 293. That is a lot more than -stset- says. This could be true if patients were coming to multiple follow up visits, I guess. But it is a very confusing scheme of follow up. Please correct me if I have misunderstood your output.

                    Moreover, there's no indication that if someone had a tumor at, say, 4 months, you recoded their time variable to 4 months (or more precisely, the 4-month visit date minus the operation date). You would need to recode the observation time based on the earliest date where a tumor was detected for this to work. Can you give us a summary of observation time in code delimiters?

                    summarize time_obs_all if m12_FinishedinProject ==1, detail
                    Last, you say you coded survival time based on the date of the 12-month visit. But your tables above appear to indicate that only 1/3 of your sample got a 12-month pathology report. Does not every visit get a pathology report?

                    Your code to indicate if there was tumor pathology on any one visit is correct as far as it goes. I can't see any errors in your -stset- code, but it's been a while since I did survival analysis. If the upload attachment button on the forum isn't working, then there are free image hosting sites like that will take graphs. I haven't seen anyone on the forum use these, but they are not prohibited by the FAQ.

                    Let's just focus on maybe seeing your graph and getting it to run. But, I think that the study has a lot of issues. For example, say one person's 4-month pathology report was normal, but their 8-month report had a tumor. Most properly, you know they developed a tumor between 4 and 8 months, but you don't know when. When you code the observation time based on the 8 month visit, Stata will treat their survival time as 8 months exactly. I believe this is interval censoring, and this may be more appropriate for discrete time survival.


                    • #11
                      As you suggest, I will post a new one.