Survival analysis and problems with multiple rows for each patient

Stine Rognan

Join Date: Jun 2021

Posts: 5
#1

Survival analysis and problems with multiple rows for each patient

04 Jun 2021, 02:15

Hello everyone,

I'm working with survival analysis and a cox model in a dataset collected from an RCT. The version of Stata is 16.1.
In the two dataex below, The variable V2 is the patient ID. For each number of V2 there is only one row with data on status and months (time to status), and this is the last row for each patient. The data on status and months are missing in rest of the rows, however these rows contain other valuable information, e.g. diagnoses variables and drug variables (potential risk factors). The number of other rows for each patient varies.
My question is how can I move the information about diagnoses and drugs from other rows on the same patient ID, to the last row for each patient that contains values on the variables mnd and status? E.g. something like this:

V2 mnd status hjsvikt diabetes M01A_ C09_

551 .75564681724846 1 0 0 1 0

(552 - 559 : examples not included)

560 .3613963039014374 1 1 0 1 1

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double(V2 mnd status) long(hjsvikt diabetes) float(M01A_ C09_) 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 .75564681724846 1 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 1 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 551 . . 0 0 0 0 end label values status Status_ut_2017 label def Status_ut_2017 1 "Reinnlagt", modify

Code:

* Example generated by -dataex-. For more info, type help dataex clear input double(V2 mnd status) long(hjsvikt diabetes) float(M01A_ C09_) 560 . . 1 0 0 0 560 . . 0 0 0 1 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 0 0 560 . . 0 0 1 0 560 . . 0 0 1 0 560 .3613963039014374 1 0 0 0 0 end label values status Status_ut_2017 label def Status_ut_2017 1 "Reinnlagt", modify

Other information: I think I need to do this because otherwise, stata omits the information in the rows which do not have survival data (status, time to status). I have tried to fill in this data in all the rows with the xfill command, e.g. xfill months, i(V2) and xfill staus, i(V2), however stata does not understand that these rows does not represent the same ID. In the resulting Kaplan Meier plots, there will be 8000 cases, not 400 as it is supposed to be.
When all the information I need for further analysis is collected in the last row of interest for each patient, I hope it will work to use the drop if status ==. command in stata, and that the dataset is then ready for survival analysis and further estimation of the cox model.
Tags: None

Announcement

Survival analysis and problems with multiple rows for each patient