Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • KM curves with 1mil observation, curves difficult to tell difference

    Hi there, I am currently using a large database with over 1 million observation.

    I am trying to plot a KM curve for different surgeries. I can not give you the exact data as the data is stored on a remote platform with no internet ! <sigh> So in order to understand what's going on I created a smaller dataset to see what's going on with data and ensure I'm using the right code. I am sure some people will complain about this, but there is nothing I can do except complain to the government who won't give me internet on our remote system - so sorry about this !


    Here's the sample data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(type revision yearofsurgery yearofrevision timetorevision_years) byte(_st _d) double _t byte _t0
    1 1 14610 16438  5.008219 1 0                 3 0
    1 0 15310     .         . 0 .                 . .
    0 0 16468     .         . 0 .                 . .
    1 1 17867 18263 1.0849315 1 1 1.084931492805481 0
    1 0 17932     .         . 0 .                 . .
    1 1 18298 19422  3.079452 1 0                 3 0
    0 1 19029 20794  4.835617 1 0                 3 0
    0 0 16109     .         . 0 .                 . .
    0 1 15745 16111 1.0027397 1 1 1.002739667892456 0
    0 1 18303 20498  6.013699 1 0                 3 0
    end
    format %td yearofsurgery
    format %td yearofrevision
    label values type surgery
    label def surgery 0 "Sling", modify
    label def surgery 1 "Pessary", modify
    Now I've attempted to create two KM curves using this code
    yearofsurgery = date when surgery took place
    yearofrevision = date revision of surgery took place
    type = type of surgery performed

    Code:
    //Generate time variable from operation to revision 
    gen timetorevision_years = (yearofrevision - yearofsurgery)/365
    
    stset timetorevision_years, failure(revision=1) exit(time 3)
    sts graph, by(type)  //Creates two KM curves for the different type of procedures 
    stdescribe

    Now my sample data create creates a nice KM curve which looks just right as seen below

    Click image for larger version

Name:	Screenshot 2023-07-24 at 13.26.56.png
Views:	2
Size:	142.7 KB
ID:	1721591

    Now when I replicated the code on my remotedata set I got these curves (the lines are really close together) .

    Click image for larger version

Name:	Screenshot 2023-07-24 at 13.30.48.png
Views:	1
Size:	125.3 KB
ID:	1721592

    Question 1: How can I address the problem to separate out the KM line graphs ?


    In fact when I then plot a cum hazard plot , the curves separate out


    Click image for larger version

Name:	Screenshot 2023-07-24 at 13.30.59.png
Views:	1
Size:	1.70 MB
ID:	1721593



    Question 2:
    In my current sample data provided above when I use
    Code:
    stsum, by(type)
    I get missing values for 50% and 75% . Why does this happen?



    Attached Files

  • #2
    You can try to adjust the vertical axis:

    Code:
    sts graph, ylabel(0.8 1)
    Sometimes graphs are not useful. You can also present a table with the survival function per group.

    Comment


    • #3
      change the yscale might help. ylabel(0.9(0.1)1.1)

      you may have to do it in the graph editor.

      Comment


      • #4
        On your timetorevision variable you are not accounting for leap years. The function -datediff_frac()- will take care of this.
        Code:
        . gen diff = datediff_frac(yearofsurgery, yearofrevision, "year")
        (4 missing values generated)
        
        . list year* time diff in 1, ab(20)
        
             +-----------------------------------------------------------------+
             | yearofsurgery   yearofrevision   timetorevision_years      diff |
             |-----------------------------------------------------------------|
          1. |     01jan2000        02jan2005               5.008219   5.00274 |
             +-----------------------------------------------------------------+
        
        . disp 1/365
        .00273973
        
        . disp 3/365
        .00821918
        The difference is due to the years 2000 and 2004 being leap years.

        Comment


        • #5
          Thanks for all this, just realised I had a coding error and was taking the wrong variable.

          With regards to my question2
          does anyone know why I am getting missing values

          Code:
           
           stsum, by(type)
          I get missing values for 50% and 75% . Why does this happen?


          Comment

          Working...
          X