survival analysis with overlapping records probable error

Umama Afr

Join Date: Feb 2017

Posts: 22
#1

survival analysis with overlapping records probable error

02 Mar 2017, 13:01

Dear Stata team,
hi again,
I am trying to run the survival analysis. I have multiple records per subject. the participant come to the clinic and the specimen is taken to measure the outcome (it takes many days for the outcome to be detected) and some times the participant may come to the next visit when the second sample is collected from him before the results of the first sample shows up.
these overlapping between the visit2 and the outcome of visit1 should be independent meaning that it doesn't matter if they result of visit1 comes before sampling in visit2. what matters is the duration that we observe for each sampling to be detected.
is there a way where I can get rid of this overlapping issue? does it affect the analysis?

I have these variables:
date of sampling
date of detection
detection result (0 for positive and 1 for negative)

I used the following
stset date of detection, id(id) failure(detection result==1) time0(date of sampling)

any help on this ?

thank you very much

best regards
Umama
Tags: None
Andrew Lover

Join Date: Apr 2014

Posts: 182
#2

02 Mar 2017, 17:02

Hard to say exactly from that info, but my first thought would be to ignore all the visit dates, and just focus on time-to-first positive test.

I'd think carefully if you are truly interested in the times to each failure. In my experience multiple failures greatly complicate the analysis without always adding too much, or having a ready interpretation (but obviously highly dependent on your specific disease states).

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
1 like
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

03 Mar 2017, 03:39

If I understood right (and I'm not sure), the date of the outcome should be the date it was identifyed, not the date the results came to the office. Being this so, there won't be any overlapping,

On the other hand, shall the event be recurrent, you may think about using a different approach, such as stratifying the Cox model and applying CP (counting process) , Marginal or Gap Time.

Best regards,

Marcos
Comment

Umama Afr

Join Date: Feb 2017
Posts: 22

04 Mar 2017, 17:45

Hi Marcos,
so for illustration this I draw this table

id	Date Start of treatment	Date of specimen collection	Date of culture detection	Detection result
1	1 Jan 13	4 Jan 13	12 Jan 13	1
1		7 Jan 13	13 Jan 13	2
1		14 Jan 13	19 Jan 13	1
2	2 Feb 13	10 Feb 13	15 Feb 13	1
2		14 Feb 13	19 Feb 13	1
2		18 Feb 13	25 Feb 13	2
2		1 Mar 13	10 Mar 13	2
3
3
4
4

so the overlap could be between the date of detection of the previous visit and the date of sampling of the following visit (for example in subject 1 the overlap is in the duration of detection 12 Jan with the second sampling time which is 7 Jan)
but I want to see if the duration (the time from sampling and until the time of detection) but when I write the stset command in stata it gives me probable error overlapping record and i don't know if I should ignore it or do something about it

also, thank you for the suggestion above, I will look more about it, if there is a command for CP would you kindly suggest how can I get it?

thank you very much
your help is highly appreciated
best regards
Umama

Comment

Umama Afr

Join Date: Feb 2017

Posts: 22
#5

04 Mar 2017, 17:48

hi Andrew,
thank you for the comment, I was thinking that I may ignore the time of sampling but the problem is that there are gaps between the dates of sampling. is there any way I could get over that?

many thanks
best regards
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

05 Mar 2017, 05:16

if there is a command for CP would you kindly suggest how can I get it?

I fear you should preferrably take a close at the CP models, not just applying a command.

That said, in a few words, you need to have a "interval" variable, the starting time and the stopping time.

Also, shall the PH assumption be violated, you may need to perform a stratified CP approach.

All in all, I strongly recommend you search further in the literature.

These are "complex" types of survival analysis and commands as well as interpretation , let alone the selection of the most appropriate model, deserve attention and care.

Best regards,

Marcos
Comment
Andrew Lover

Join Date: Apr 2014

Posts: 182
#7

05 Mar 2017, 16:41

Echoing Marco's advice, the data setup for multiple failures takes some very careful thought, FAQ here:

http://www.stata.com/support/faqs/st...ure-time-data/

and some other helpful links here:

http://www1.udel.edu/ASA/Therneau_slides_for_packet.pdf

https://stat.ethz.ch/education/semes...ntation_10.pdf

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment

Umama Afr

Join Date: Feb 2017
Posts: 22

06 Mar 2017, 11:03

thank you for the informative sites
so given that my data is longitudinal and the measurement which is the time to detection is repeated over visits, I should consider CP or stratified Cox, from the sites I understood that I may use conditional ordered event survival.
so for each subject I have the variable of entering the time interval and and also the variable of exiting the time interval. You mentioned that I need to have interval variable: my understanding is that it should be a variable that code each strata? for example in my case it could be the number of visit (each visit represent the sampling time or the enter)
so I am guessing that the table should be something like this

id	Visit#	Date Start of treatment	Date of specimen collection	Date of culture detection	Detection result	X
1	1	1 Jan 13	4 Jan 13	12 Jan 13	1	1
1	2		7 Jan 13	13 Jan 13	2	1
1	3		14 Jan 13	19 Jan 13	1	1
2	1	2 Feb 13	10 Feb 13	15 Feb 13	1	2
2	2		14 Feb 13	19 Feb 13	1	2
2	3		18 Feb 13	25 Feb 13	2	2
2	4		1 Mar 13	10 Mar 13	2	2
3
3
4
4

but first i should check the PH assumption for the exposure of interest? by using the log-log graphs or the test estat phtest but if violated then I need to stratify
but do I stratify based on the exposure of interest or do I stratify based on the Visit#?

from the site that Andrew provided
http://www.stata.com/support/faqs/st...ure-time-data/

this approach for modeling sounds reasonable 3.2.3 The conditional risk set model (time from entry)

but I need to consider that there are gaps between the visits so the last time of each visit is not necessarily the entry for the second visit

also still the overlapping records problem I don't know how to deal with it

thank you very much

Comment

Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#9

06 Mar 2017, 12:23

Hello Umama,

I recommend you "associate" command, output and problem. For example, in the case of the PH assumption, I kindly suggest to post the command, share the results, then discuss about the alternatives, shall there be violation. In short, you may "extend" the model by using the tvc() option, or you may use the "culprit" variable as stratum.

Please read the FAQ, mainly on the topic related to sharing command and output,

You may use CODE delimiters or install the SSC dataex.

Thanks.

Best regards,

Marcos
Comment
Umama Afr

Join Date: Feb 2017

Posts: 22
#10

06 Mar 2017, 13:19

that is great idea Marcos, thank you for the suggestion
I will check the site you suggested and post the commands and the output on this page
many thanks
best regards
Umama
Comment
Umama Afr

Join Date: Feb 2017

Posts: 22
#11

22 Mar 2017, 09:11

Hi Marcos,
so I tried the tvc command in the cox model that I built but I am not sure what would be a good diagnostic post-modelling tool to test the validity of the model

my outcome is the time to culture negative
my time is the time from treatment
i used the following

stset time, id(id) failure(negative==1)
cox X1 X2, efron vce(robust) tvc(X3)

X3 is the time varying covariate coded as 1 until 4 months of treatment and 2 until the end of the follow up , X1 is the main exposure, i have it as continuous variable but I am also going to try category as binary

before I used the tvc I was depending on the aic and the residual plots
for the residuals this is what I used

predict mg, mgale
predict dr, deviance
predict xb, xb
scatter mg xb
scatter dr xb

but after I added the tvc to include the time varying covariate stata didn't accept these commands

plenty of thanks!
Comment

Announcement