Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Preparing data for competing risk analysis

    Dear Stata users,


    I am working on survival data and trying to create new_event and time to event variable for competing risks analysis. As I understood, time variable must contain the time of occurence of whichever event occurs first, whether it is event1, event2 (death), or end of observation due to withdrawal/end of study.

    I would really appreciate if anyone me help me to create these variables.


    Data looks a below:
    PHP Code:
    [CODE]

    clear
    input float
    (id numeric_visitint days_from_baseline byte event1 float(time_toevent1 event2double time_toevent2
    1  0    0 0  7.00274 0                  7
    1  4  115 0  7.00274 0                  7
    1 12  349 0  7.00274 0                  7
    1 24  731 0  7.00274 0                  7
    1 36 1101 0  7.00274 0                  7
    1 48 1452 0  7.00274 0                  7
    1 60 1823 0  7.00274 0                  7
    1 72 2187 0  7.00274 0                  7
    1 84 2558 0  7.00274 0                  7
    2  0    0 0      2.8 1 3.2165206976570104
    2  4  128 0      2.8 1 3.2165206976570104
    2 12  338 0      2.8 1 3.2165206976570104
    2 16  476 0      2.8 1 3.2165206976570104
    2 24  735 0      2.8 1 3.2165206976570104
    2 28  872 0      2.8 1 3.2165206976570104
    3  0    0 1 1.880822 1  2.221917808219178
    3  4  122 1 1.880822 1  2.221917808219178
    3 12  373 1 1.880822 1  2.221917808219178
    3 16  492 1 1.880822 1  2.221917808219178
    3 20  616 1 1.880822 1  2.221917808219178
    4  0    0 0 4.813699 0  4.809319559847294
    4  4  126 0 4.813699 0  4.809319559847294
    4 12  356 0 4.813699 0  4.809319559847294
    4 24  720 0 4.813699 0  4.809319559847294
    4 36 1086 0 4.813699 0  4.809319559847294
    4 48 1432 0 4.813699 0  4.809319559847294
    end
    [/CODE


    Many thanks.
    Oyun
    Last edited by Buyadaa Oyunchimeg; 18 Dec 2018, 22:36.

  • #2
    I'm not sure I understand your data as your description is rather incomplete. But let me tell you what I think you have here:

    There are two events in competition, event 1 and event 2. Although each id has values of variables event1 and event2 and time_toevent1 and time_toevent2 recorded in many observations, the values of those are always the same for any single id. So despite the appearance of longitudinal data, what you have is repetitions of the same single observation per id. The variable event1 is coded 1 if event1 happens to this id at time = time_toevent1. If event1 is coded 0, it means that event1 is censored for id at time_toevent1. Analogous considerations for event2 apply.

    If the above is correct, then what is needed is to reduce this to a single observation for each event for each id, with a single variable that indicates time to that observation's event (or censorship for that event) and which event (if either occurred).

    Code:
    clear
    input float(id numeric_visit) int days_from_baseline byte event1 float(time_toevent1 event2) double time_toevent2
    1  0    0 0  7.00274 0                  7
    1  4  115 0  7.00274 0                  7
    1 12  349 0  7.00274 0                  7
    1 24  731 0  7.00274 0                  7
    1 36 1101 0  7.00274 0                  7
    1 48 1452 0  7.00274 0                  7
    1 60 1823 0  7.00274 0                  7
    1 72 2187 0  7.00274 0                  7
    1 84 2558 0  7.00274 0                  7
    2  0    0 0      2.8 1 3.2165206976570104
    2  4  128 0      2.8 1 3.2165206976570104
    2 12  338 0      2.8 1 3.2165206976570104
    2 16  476 0      2.8 1 3.2165206976570104
    2 24  735 0      2.8 1 3.2165206976570104
    2 28  872 0      2.8 1 3.2165206976570104
    3  0    0 1 1.880822 1  2.221917808219178
    3  4  122 1 1.880822 1  2.221917808219178
    3 12  373 1 1.880822 1  2.221917808219178
    3 16  492 1 1.880822 1  2.221917808219178
    3 20  616 1 1.880822 1  2.221917808219178
    4  0    0 0 4.813699 0  4.809319559847294
    4  4  126 0 4.813699 0  4.809319559847294
    4 12  356 0 4.813699 0  4.809319559847294
    4 24  720 0 4.813699 0  4.809319559847294
    4 36 1086 0 4.813699 0  4.809319559847294
    4 48 1432 0 4.813699 0  4.809319559847294
    end
    
    //    VERIFY TIME TO EVENT1 AND TIME TO EVENT2
    //    AND EVENT1 AND EVENT2
    //    ARE CONSISTENT WITHIN ID
    forvalues i = 1/2 {
        by id (time_toevent`i'), sort: assert time_toevent`i'[1] == time_toevent`i'[_N]
        by id (event`i'), sort: assert event`i'[1] == event`i'[_N]
    }
    
    collapse (first) event* time_toevent*, by(id)
    reshape long event time_toevent, i(id) j(which_event)
    replace which_event = 0 if event == 0
    sort id time_toevent
    drop event
    
    stset time_toevent, failure(which_event = 1) id(id)
    should be what you want, if my assumptions are correct. When you do your competing risks regression, specify the -compete()- option as -compete(which_event = 2)-. (which_event = 0 observations then represent censored observations.)

    Comment


    • #3
      Thank you so much prof Schechter.

      Your assumptions are correct and this is exactly what I wanted.

      Comment

      Working...
      X