Advice regarding estimating competing risk

Clyde Schechter replied

05 Feb 2019, 15:41
Yes.
Leave a comment:
Jonas Kristensen replied

05 Feb 2019, 14:38
Thank you again for your support Mr. Shechter! I really appreciate your guidance.

Would it make sense to detirmine if somebody died before stroke (because of error in the register) with the following command:

Code:

gen test = end_follow_date- start_follow-date if _st==0 sum test sum test if test==0

I get the following results from these tests

Attached Files
Leave a comment:
Clyde Schechter replied

05 Feb 2019, 12:43
Is it possible that you could explain "gen outcome = !missing(first_post_stroke_fx_date)" to me? Does it mean that first_post_stroke_fx_date is the outcome/failure variable, excluding all that are missing from first_post_stroke_fx_date?

No, the date is not the outcome variable. Rather the outcome variable is whether or not there is such a date. It is just a simple way of identifying those fractures that occur after a stroke and before any other fracture--the way first_post_stroke_fx_date was calculated, it is set to missing whenever no such fracture is found. It would also be missing in cases where there is such a fracture but for some reason the date variable for that record is missing--I don't know if there are any such cases in your registry data. If there are, they might also account for some of the small discrepancies you note.

Could this small amount of missing fractures be explained somehow by death, deathdates, endpoint coming before, because non of these patients are missing the outcome-variable or first_post_stroke_fx_date

Perhaps. I really can't say without access to the data.

I don't understand where the 16,273 number is coming from. Your description of it is not clear to me. But let's say that there is some variable x which identifies these 16,273 observations. That is, let's say x is 1 in exactly these 16,273 observations and 0 in all other observations. If you run:

Code:

browse if _d != x

you will see all of the observations that -stset- counts as a failure, but not your method x, as well as any other observations that perhaps your method x counts as a failure but -stset- does not. You can then try to figure out what it is about those observations that accounts for the discrepancy.
Leave a comment:
Jonas Kristensen replied

05 Feb 2019, 10:59
Yes! I really think that this is it. Thank you so much. When i use the your advised outcome - "gen outcome = !missing(first_post_stroke_fx_date)"
i get 16.268 failures, which correlates very well with the 16.273 1's from first_poststroke_fx-date and the 16,273 diag-codes for fractures.

If this is it, I maybe have my end results, which makes me very happy. Is it possible that you could explain "gen outcome = !missing(first_post_stroke_fx_date)" to me? Does it mean that first_post_stroke_fx_date is the outcome/failure variable, excluding all that are missing from first_post_stroke_fx_date?

And Is there any smart way to be sure of this or double check it? I get 16.268 failures, where five fractures are missing (16.273-16.268). And when when estimating incidence rates seperately by using:

Code:

// INCIDENCE-RATE OF CATEGORICAL FRACTURE-CODE stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 / 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id) stptime

And all of the other categorical fracture codes, i get 16.256 failures in total, where seventeen fractures are missing ( 16.273-16.256).
Could this small amount of missing fractures be explained somehow by death, deathdates, endpoint coming before, because non of these patients are missing the outcome-variable or first_post_stroke_fx_date
Attached Files
Last edited by Jonas Kristensen; 05 Feb 2019, 11:41.
Leave a comment:
Clyde Schechter replied

05 Feb 2019, 10:44
Well, yes, many of the fractures are lost when using -by person_id: keep if _n == 1-, but you are only losing fractures that occurred before the first stroke or were second or later post-stroke fractures. Reviewing the code (copied from #9):

Code:

sort person_id stroke_date by person_id (stroke_date): gen start_follow_date = stroke_date[1] by person_id (admission_date), sort: egen first_post_stroke_fx_date = /// min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .)) by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .)) by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato) by person_id: egen had_fracture = max(has_fracture_now) gen dead = 1 replace dead = 0 if missing(death_date) by person_id: egen died = max(dead) by person_id: keep if _n == 1

The third command identifies the date of the first post-stroke fracture, and the subsequent one identifies the diagnosis code of that stroke. These variables will be the same in all observations for a given person_id, so when we keep only the first of these, none of the information we need is lost. The same is true of the variable had_fracture.

But I notice that the variable had_fracture is indicates whether or not a person ever had a fracture (which means it could be a fracture before stroke, or a second or later fracture after stroke.) So it seems that we erred when we later use had_fracture as the outcome in our analysis, because that counts all fractures ever, not just just first fractures following a stroke. The outcome should really be -gen outcome = !missing(first_post_stroke_fx_date)-.

So the problem is not that you are "losing" fractures, the problem is that the -had_fracture- variable is counting fractures that should not be included in your analysis because they occur at the wrong time.
Leave a comment:
Jonas Kristensen replied

05 Feb 2019, 10:18
Sorry, i have corrected it the command above. It was a typo.

Code:

gen has_fracture_now = 1 replace has_fracture_now = 0 if missing(diag) label define has_fracture_now 0 "No fracture" 1 "Fracture" label values has_fracture_now has_fracture_now

This is the way i did it. Wouldn't that be correct?

I believe the reason why has_fracture_now is so low, compared to the 16,273 diag-codes, is because of the many duplicates. Because the diag codes are stored in a seperate variables, which takes all of the first diagcodes coming after the stroke-date, using:

Code:

by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .))

Whereas many of the has_fracture_now are lost when using:

Code:

by person_id: keep if _n == 1
Last edited by Jonas Kristensen; 05 Feb 2019, 10:24.
Leave a comment:
Clyde Schechter replied

05 Feb 2019, 10:15
The command -gen has_fracture_now- is a syntax error; this cannot be the code you used. It is missing the =exp part, and that is the key part to understanding that variable!

Now the fact that your has_fracture_now variable has only 4018 observations with value 1, whereas there are 16,273 observations that have a fracture code, tells me that whatever you did to create has_fracture_now is wrong. The problem is not with the way it was later aggregated up to had_fracture; the variable has_fracture_now is wrong to start with.
Leave a comment:
Jonas Kristensen replied

05 Feb 2019, 09:58
has_fracture_now i created based on wether or not the patient has a diagnosiscode to begin with, using the following command:

Code:

gen has_fracture_now = 1 replae has_fracture_now = 0 if missing(diag) label define has_fracture_now 0 "No fracture" 1 "Fracture" label values has_fracture_now has_fracture_now

This is just before all the commands in #9

With the 116.519 patients remaining the "has_fracture_now" has 4018 1's and the initial "diag", from which it was created also has 4018 1's.

The diagnosiscode-variabel, used to estimate specific fracture-types has 16.273 1's (fracturecodes), which correlates perfectly to "first_poststroke_fx_date", which also has 16.273 1's (dates)

That's why i think maybe something goes wrong when using

Code:

by person_id: egen had_fracture = max(has_fracture_now)
Last edited by Jonas Kristensen; 05 Feb 2019, 10:17.
Leave a comment:
Clyde Schechter replied

05 Feb 2019, 09:54
Well, the code does not show where the variable has_fracture_now comes from. If I remember from the earlier thread, which was removed from Statalist, that variable was not created by the code but was already existing (or assumed to exist) in the data set. So the question is whether or not has_fracture_now agrees with having a diagcode within that list of codes. Perhaps it does not. Perhaps try -tab diagcode has_fracture_now- to see if every fracture-related diagnosis code has all of its observations in the 1 column of has_fracture_now in that output. Maybe that has_fracture_now variable is missing some of the codes.
Leave a comment:
Jonas Kristensen replied

05 Feb 2019, 09:44
Yes i agree. Thank you for answering!
The wierd thing is that after i have used the entire list of codes that you showed me, and i have set up time points and removed duplicates, i have 116.519 patients.
If i then tabulate "had fracture", there is 34.502 fractures (1's) and i get 34.290 failures when i run "stset endpoint, failure(outcome=1) scale(365.25) origin(start_follow_date)".
AND THEN if i tabulate "first_poststroke_fx_date" i get 16.273 who has a date (who had a fracture), and if i tabulate "diagnosiskode" i also get 16.273 who has a diagcode, which also adds up to the amount of failures when estimating the incidence rate for diagnosis specific fractures, which was 16.256 failures (see #9).

Doesn't this all mean that the problem actually lies with the variable "had_fracture", and not with diagnosis code? And if it does, could it be solved by generating the outcome variable on the basis of wether or not the patient has a first_poststroke_fx_date instead of using "had_fracture"?

The thing i have a hard time understanding, is how there can be 34.290 failures when i run the above stset command, because much more patients must have died before fracture than 34.502-34.290=212

Last edited by Jonas Kristensen; 05 Feb 2019, 09:53.
Leave a comment:
Clyde Schechter replied

05 Feb 2019, 09:22
I don't see why the code you are using would fail to pick up some fracture codes. Are you sure that all of those fracture codes actually occur in the data as the first fracture after a stroke?
Leave a comment:
Jonas Kristensen replied

05 Feb 2019, 05:52
Okay
Would there be another or better way, command-wise, to estimate the categorical-diagnosis code for fractures than using the below commands:

Code:

by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .) stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 /// 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id) stptime

The above is for example diagnosis code DS02, which is the categorical/upper-hierarchical diagnosis code for Cranial and Face Fractures. It includes DS020-DS29A, which has the codes 1-45, which i use as failure above.
I think the approach above is correct, but there are simply some of the codes that it does not include in the calculations, which therefore results in to few failures/fractures. A possible solution could be creating af varibale, as you showed me with:

Code:

by person_id: egen had_fracture = max(has_fracture_now

Where every duplicate person_id gets the first diagnosis code that was registered. Because i think it is when removing duplicates that many of the diagnosiscodes are lost, and therefore there are not as many failures as for "had_fracture"
Last edited by Jonas Kristensen; 05 Feb 2019, 06:39.
Leave a comment:
Clyde Schechter replied

04 Feb 2019, 12:45
I do not understand what the Table 4 you are showing is and how the numbers in it were calculated, so I can't begin to think about why your results are different from that.
Leave a comment:
Jonas Kristensen replied

04 Feb 2019, 10:50
As you can see in the photo below, it apperently does not included all of the diagnosis codes, when generating the variable "diagnosiscode" (3 i for example missing) Is there another way to estimate these type-specific incidence rates to get around this issue?
Attached Files
Leave a comment:

Jonas Kristensen replied

04 Feb 2019, 09:01

Thank you so much! I have now settled on following patients until 2017.

In terms of estimating the incidence-rate of specific fractures (diagnosis codes), I have used the code below:

Code:

sort person_id stroke_date
by person_id (stroke_date): gen start_follow_date = stroke_date[1]
by person_id (admission_date), sort: egen first_post_stroke_fx_date = ///
min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .))
by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .))
by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato)
by person_id: egen had_fracture = max(has_fracture_now)

gen dead = 1
replace dead = 0 if missing(death_date)
by person_id: egen died = max(dead)

by person_id: keep if _n == 1

// FOR FRACTURE OUTCOME ONLY
gen outcome = had_fracture
gen endpoint = min(first_post_stroke_fx_date, end_follow_date)

// INCIDENCE-RATE OF FRACTURES
stset endpoint, failure(outcome = 1) scale(365250) origin(start_follow_date) id(person_id)
stptime

// INCIDENCE-RATE OF DIAGNOSIS SPECIFIC FRACTURE
stset endpoint, failure(diagnosiscode = 5) scale(365250) origin(start_follow_date) id(person_id)
stptime

// INCIDENCE-RATE OF CATEGORICAL FRACTURE-CODE
stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 //
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id)
stptime

When adding all of the diagnosis-specific fractures together, it amounts to 16.256 failures (fractures), but when estimating the incidence rate of fractures in general, it amounts to 34.290 failures (fractures), which indicates that something probably goes wrong when i am generating the diagnosiscode-variable. I just can't seem to figure out what it is. I have attatched a small excerpt of the results, to give you an idea of what i am seeing.

Attached Files

Last edited by Jonas Kristensen; 04 Feb 2019, 09:06.

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: