Preparing data for competing risk analysis

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#1

Preparing data for competing risk analysis

18 Dec 2018, 22:08

Dear Stata users,

I am working on survival data and trying to create new_event and time to event variable for competing risks analysis. As I understood, time variable must contain the time of occurence of whichever event occurs first, whether it is event1, event2 (death), or end of observation due to withdrawal/end of study.

I would really appreciate if anyone me help me to create these variables.

Data looks a below:

PHP Code:

[CODE] clear input float(id numeric_visit) int days_from_baseline byte event1 float(time_toevent1 event2) double time_toevent2 1 0 0 0 7.00274 0 7 1 4 115 0 7.00274 0 7 1 12 349 0 7.00274 0 7 1 24 731 0 7.00274 0 7 1 36 1101 0 7.00274 0 7 1 48 1452 0 7.00274 0 7 1 60 1823 0 7.00274 0 7 1 72 2187 0 7.00274 0 7 1 84 2558 0 7.00274 0 7 2 0 0 0 2.8 1 3.2165206976570104 2 4 128 0 2.8 1 3.2165206976570104 2 12 338 0 2.8 1 3.2165206976570104 2 16 476 0 2.8 1 3.2165206976570104 2 24 735 0 2.8 1 3.2165206976570104 2 28 872 0 2.8 1 3.2165206976570104 3 0 0 1 1.880822 1 2.221917808219178 3 4 122 1 1.880822 1 2.221917808219178 3 12 373 1 1.880822 1 2.221917808219178 3 16 492 1 1.880822 1 2.221917808219178 3 20 616 1 1.880822 1 2.221917808219178 4 0 0 0 4.813699 0 4.809319559847294 4 4 126 0 4.813699 0 4.809319559847294 4 12 356 0 4.813699 0 4.809319559847294 4 24 720 0 4.813699 0 4.809319559847294 4 36 1086 0 4.813699 0 4.809319559847294 4 48 1432 0 4.813699 0 4.809319559847294 end [/CODE]

Many thanks.
Oyun

Last edited by Buyadaa Oyunchimeg; 18 Dec 2018, 22:36.
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30108

18 Dec 2018, 23:15

I'm not sure I understand your data as your description is rather incomplete. But let me tell you what I think you have here:

There are two events in competition, event 1 and event 2. Although each id has values of variables event1 and event2 and time_toevent1 and time_toevent2 recorded in many observations, the values of those are always the same for any single id. So despite the appearance of longitudinal data, what you have is repetitions of the same single observation per id. The variable event1 is coded 1 if event1 happens to this id at time = time_toevent1. If event1 is coded 0, it means that event1 is censored for id at time_toevent1. Analogous considerations for event2 apply.

If the above is correct, then what is needed is to reduce this to a single observation for each event for each id, with a single variable that indicates time to that observation's event (or censorship for that event) and which event (if either occurred).

Code:

clear
input float(id numeric_visit) int days_from_baseline byte event1 float(time_toevent1 event2) double time_toevent2
1  0    0 0  7.00274 0                  7
1  4  115 0  7.00274 0                  7
1 12  349 0  7.00274 0                  7
1 24  731 0  7.00274 0                  7
1 36 1101 0  7.00274 0                  7
1 48 1452 0  7.00274 0                  7
1 60 1823 0  7.00274 0                  7
1 72 2187 0  7.00274 0                  7
1 84 2558 0  7.00274 0                  7
2  0    0 0      2.8 1 3.2165206976570104
2  4  128 0      2.8 1 3.2165206976570104
2 12  338 0      2.8 1 3.2165206976570104
2 16  476 0      2.8 1 3.2165206976570104
2 24  735 0      2.8 1 3.2165206976570104
2 28  872 0      2.8 1 3.2165206976570104
3  0    0 1 1.880822 1  2.221917808219178
3  4  122 1 1.880822 1  2.221917808219178
3 12  373 1 1.880822 1  2.221917808219178
3 16  492 1 1.880822 1  2.221917808219178
3 20  616 1 1.880822 1  2.221917808219178
4  0    0 0 4.813699 0  4.809319559847294
4  4  126 0 4.813699 0  4.809319559847294
4 12  356 0 4.813699 0  4.809319559847294
4 24  720 0 4.813699 0  4.809319559847294
4 36 1086 0 4.813699 0  4.809319559847294
4 48 1432 0 4.813699 0  4.809319559847294
end

//    VERIFY TIME TO EVENT1 AND TIME TO EVENT2
//    AND EVENT1 AND EVENT2
//    ARE CONSISTENT WITHIN ID
forvalues i = 1/2 {
    by id (time_toevent`i'), sort: assert time_toevent`i'[1] == time_toevent`i'[_N]
    by id (event`i'), sort: assert event`i'[1] == event`i'[_N]
}

collapse (first) event* time_toevent*, by(id)
reshape long event time_toevent, i(id) j(which_event)
replace which_event = 0 if event == 0
sort id time_toevent
drop event

stset time_toevent, failure(which_event = 1) id(id)

should be what you want, if my assumptions are correct. When you do your competing risks regression, specify the -compete()- option as -compete(which_event = 2)-. (which_event = 0 observations then represent censored observations.)

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#3

18 Dec 2018, 23:26

Thank you so much prof Schechter.

Your assumptions are correct and this is exactly what I wanted.
Comment

Announcement