Advice regarding estimating competing risk

Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#1

Advice regarding estimating competing risk

29 Jan 2019, 05:56

If i desire to calculate the cumulative incidence of both outcome (fracture=1/no fracture=0) and death (dead=1/alive=0) in a cohort, using competing risk, does the "status" variable that i am using have to have outcome, death and missing combined (outcome = 1 dead = 2 no outcome/no death=3). I have been trying with the command below, but it wont seem to work:

Code:

gen outcome = had_fracture gen endpoint = min(first_fracture_date, end_follow_date) stset endpoint, failure(outcome = 1) scale(365.25) origin(start_follow_date) stcompet= , compet(dead=1)

It is when using "stcompet= , compet(dead=1)", that i am suspecting that i need to combine outcome and dead into one variabel for it to be correct

And then to depict the (competing) cumulated incidence of fractures and death:

Code:

twoway /// (line cummort _t if status==1 , c(J)) /// (line cummort _t if dead==1 , c(J))

I cannot use dataex, because i am working with sensitive data, sorry. Please tell me if more information is needed for better comprehension, thank you
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#2

29 Jan 2019, 10:46

To use -stcompet- you must create a single variable with levels for all of the outcomes under consideration. You specify one of them as your failure outcome in the -stset- command, and you specify the others as a numerical list (Stata numlist) in the -compet()- option.

Also I don't believe the syntax -stcompet = ,...- is allowed. I have very little experience with this command (I usually use -stcrreg-, which implements the Fine & Gray approach), but I believe you have to specify something like -stcompet cummort = ci, ...- And then you should be able to do the graphs the way you describe.
Comment

Jonas Kristensen

Join Date: Jan 2019
Posts: 41

29 Jan 2019, 11:43

Thank you for helping me again Mr. Schechter. I really appreciate it. You are right

In regards to the help you gave me with the command below. I am trying to estimate the incidence-rate of fractures:

Code:

sort person_id stroke_date
by person_id (stroke_date): gen start_follow_date = stroke_date[1]
by person_id (admission_date), sort: egen first_fracture_date = min(cond(has_fracture_now, admission_date, .))
by person_id (admission_date): gen end_follow_date = min(admission_date[_N], death_date)
by person_id: egen had_fracture = max(has_fracture_now) 
 
gen dead = 1
replace dead = 0 if missing(death_date)
by person_id: egen died = max(dead) 
 
by person_id: keep if _n == 1

 //  FOR FRACTURE OUTCOME ONLY
gen outcome = had_fracture
gen endpoint = min(first_fracture_date, end_follow_date)
 
// INCIDENCE-RATE OF FRACTURES
stset endpoint, failure(outcome = 1) scale(36525) origin(start_follow_date) id(person_id)
stptime

As you can see in the attached picture, the analysis of the incidence-rate has "36.682 observations end on or before enter()"
Is that because some patients has had a fracture before their stroke date, which means that they are therefore not included when stata does the analysis?

And if that is the case, how can i assure that if they actually also have a fracture after their stroke, that that is the facture that is included.

Attached Files

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30103

29 Jan 2019, 12:17

Yes, this means that they had a fracture before their first stroke. If your goal is to estimate the incidence of fracture after stroke, though, it would seem appropriate to exclude them, as Stata does. They would not be incident cases.

On another interpretation, if you want to simply disregard fractures that occur before the first stroke and consider the incidence rate of new fractures that do occur after the stroke, but your data is contaminated with those old, pre-stroke fractures, then you could do this:

Code:

sort person_id stroke_date
by person_id (stroke_date): gen start_follow_date = stroke_date[1]
by person_id (admission_date), sort: egen first_post_stroke_fx_date = ///
    min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .))
by person_id (admission_date): gen end_follow_date = min(admission_date[_N], death_date)
by person_id: egen had_fracture = max(has_fracture_now) 
 
gen dead = 1
replace dead = 0 if missing(death_date)
by person_id: egen died = max(dead) 
 
by person_id: keep if _n == 1

 //  FOR FRACTURE OUTCOME ONLY
gen outcome = had_fracture
gen endpoint = min(first_post_stroke_fx_date, end_follow_date)
 
// INCIDENCE-RATE OF FRACTURES
stset endpoint, failure(outcome = 1) scale(36525) origin(start_follow_date) id(person_id)
stptime

Changes to previous code are in italics. This code will give you the date of the first post-stroke fracture (if any), rather than the first stroke ever.

Comment

Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#5

29 Jan 2019, 13:36

This is exactly what i would like to do: "On another interpretation, if you want to simply disregard fractures that occur before the first stroke and consider the incidence rate of new fractures that do occur after the stroke, but your data is contaminated with those old, pre-stroke fractures, then you could do this:"

I used the above command, but there is still 19192 observations end on or before enter(). I have attached a comparison photo, with before and after the new code.

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#6

29 Jan 2019, 15:35

Could your data have some people in it whose final admission date precedes, or is the same as their first stroke date? These would also be people whose data ends on or before their enter date.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#7

01 Feb 2019, 05:04

Yes of course. That is definitely the case. If i then have end of follow up at last admission date, would it then makes sense to include these patients if they did not have fracture or died, because then they would theoretically have zero risk time, because they have not registered admission AFTER their stroke.

And how can one go about this, if this is the case?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#8

01 Feb 2019, 14:54

Well, Stata is already excluding those people for you: it's telling you about that in that warning message about observations entering on or before enter(). So if you just leave it the way you have it, you're automatically getting this.

If it bothers you to see that message, you can change the -stset- command as follows:

Code:

stset endpoint if start_follow_date < endpoint, failure(outcome = 1) scale(36525) origin(start_follow_date) id(person_id)

The warning from -stset- will go away. But the results will be unchanged.
Comment

Jonas Kristensen

Join Date: Jan 2019
Posts: 41

04 Feb 2019, 09:01

Thank you so much! I have now settled on following patients until 2017.

In terms of estimating the incidence-rate of specific fractures (diagnosis codes), I have used the code below:

Code:

sort person_id stroke_date
by person_id (stroke_date): gen start_follow_date = stroke_date[1]
by person_id (admission_date), sort: egen first_post_stroke_fx_date = ///
min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .))
by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .))
by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato)
by person_id: egen had_fracture = max(has_fracture_now)

gen dead = 1
replace dead = 0 if missing(death_date)
by person_id: egen died = max(dead)

by person_id: keep if _n == 1

// FOR FRACTURE OUTCOME ONLY
gen outcome = had_fracture
gen endpoint = min(first_post_stroke_fx_date, end_follow_date)

// INCIDENCE-RATE OF FRACTURES
stset endpoint, failure(outcome = 1) scale(365250) origin(start_follow_date) id(person_id)
stptime

// INCIDENCE-RATE OF DIAGNOSIS SPECIFIC FRACTURE
stset endpoint, failure(diagnosiscode = 5) scale(365250) origin(start_follow_date) id(person_id)
stptime

// INCIDENCE-RATE OF CATEGORICAL FRACTURE-CODE
stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 //
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id)
stptime

When adding all of the diagnosis-specific fractures together, it amounts to 16.256 failures (fractures), but when estimating the incidence rate of fractures in general, it amounts to 34.290 failures (fractures), which indicates that something probably goes wrong when i am generating the diagnosiscode-variable. I just can't seem to figure out what it is. I have attatched a small excerpt of the results, to give you an idea of what i am seeing.

Attached Files

Last edited by Jonas Kristensen; 04 Feb 2019, 09:06.

Comment

Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#10

04 Feb 2019, 10:50

As you can see in the photo below, it apperently does not included all of the diagnosis codes, when generating the variable "diagnosiscode" (3 i for example missing) Is there another way to estimate these type-specific incidence rates to get around this issue?
Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#11

04 Feb 2019, 12:45

I do not understand what the Table 4 you are showing is and how the numbers in it were calculated, so I can't begin to think about why your results are different from that.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#12

05 Feb 2019, 05:52

Okay
Would there be another or better way, command-wise, to estimate the categorical-diagnosis code for fractures than using the below commands:

Code:

by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .) stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 /// 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id) stptime

The above is for example diagnosis code DS02, which is the categorical/upper-hierarchical diagnosis code for Cranial and Face Fractures. It includes DS020-DS29A, which has the codes 1-45, which i use as failure above.
I think the approach above is correct, but there are simply some of the codes that it does not include in the calculations, which therefore results in to few failures/fractures. A possible solution could be creating af varibale, as you showed me with:

Code:

by person_id: egen had_fracture = max(has_fracture_now

Where every duplicate person_id gets the first diagnosis code that was registered. Because i think it is when removing duplicates that many of the diagnosiscodes are lost, and therefore there are not as many failures as for "had_fracture"

Last edited by Jonas Kristensen; 05 Feb 2019, 06:39.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#13

05 Feb 2019, 09:22

I don't see why the code you are using would fail to pick up some fracture codes. Are you sure that all of those fracture codes actually occur in the data as the first fracture after a stroke?
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#14

05 Feb 2019, 09:44

Yes i agree. Thank you for answering!
The wierd thing is that after i have used the entire list of codes that you showed me, and i have set up time points and removed duplicates, i have 116.519 patients.
If i then tabulate "had fracture", there is 34.502 fractures (1's) and i get 34.290 failures when i run "stset endpoint, failure(outcome=1) scale(365.25) origin(start_follow_date)".
AND THEN if i tabulate "first_poststroke_fx_date" i get 16.273 who has a date (who had a fracture), and if i tabulate "diagnosiskode" i also get 16.273 who has a diagcode, which also adds up to the amount of failures when estimating the incidence rate for diagnosis specific fractures, which was 16.256 failures (see #9).

Doesn't this all mean that the problem actually lies with the variable "had_fracture", and not with diagnosis code? And if it does, could it be solved by generating the outcome variable on the basis of wether or not the patient has a first_poststroke_fx_date instead of using "had_fracture"?

The thing i have a hard time understanding, is how there can be 34.290 failures when i run the above stset command, because much more patients must have died before fracture than 34.502-34.290=212

Last edited by Jonas Kristensen; 05 Feb 2019, 09:53.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30103
#15

05 Feb 2019, 09:54

Well, the code does not show where the variable has_fracture_now comes from. If I remember from the earlier thread, which was removed from Statalist, that variable was not created by the code but was already existing (or assumed to exist) in the data set. So the question is whether or not has_fracture_now agrees with having a diagcode within that list of codes. Perhaps it does not. Perhaps try -tab diagcode has_fracture_now- to see if every fracture-related diagnosis code has all of its observations in the 1 column of has_fracture_now in that output. Maybe that has_fracture_now variable is missing some of the codes.
Comment

Announcement