Advice regarding estimating competing risk

Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#16

05 Feb 2019, 09:58

has_fracture_now i created based on wether or not the patient has a diagnosiscode to begin with, using the following command:

Code:

gen has_fracture_now = 1 replae has_fracture_now = 0 if missing(diag) label define has_fracture_now 0 "No fracture" 1 "Fracture" label values has_fracture_now has_fracture_now

This is just before all the commands in #9

With the 116.519 patients remaining the "has_fracture_now" has 4018 1's and the initial "diag", from which it was created also has 4018 1's.

The diagnosiscode-variabel, used to estimate specific fracture-types has 16.273 1's (fracturecodes), which correlates perfectly to "first_poststroke_fx_date", which also has 16.273 1's (dates)

That's why i think maybe something goes wrong when using

Code:

by person_id: egen had_fracture = max(has_fracture_now)

Last edited by Jonas Kristensen; 05 Feb 2019, 10:17.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#17

05 Feb 2019, 10:15

The command -gen has_fracture_now- is a syntax error; this cannot be the code you used. It is missing the =exp part, and that is the key part to understanding that variable!

Now the fact that your has_fracture_now variable has only 4018 observations with value 1, whereas there are 16,273 observations that have a fracture code, tells me that whatever you did to create has_fracture_now is wrong. The problem is not with the way it was later aggregated up to had_fracture; the variable has_fracture_now is wrong to start with.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#18

05 Feb 2019, 10:18

Sorry, i have corrected it the command above. It was a typo.

Code:

gen has_fracture_now = 1 replace has_fracture_now = 0 if missing(diag) label define has_fracture_now 0 "No fracture" 1 "Fracture" label values has_fracture_now has_fracture_now

This is the way i did it. Wouldn't that be correct?

I believe the reason why has_fracture_now is so low, compared to the 16,273 diag-codes, is because of the many duplicates. Because the diag codes are stored in a seperate variables, which takes all of the first diagcodes coming after the stroke-date, using:

Code:

by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .))

Whereas many of the has_fracture_now are lost when using:

Code:

by person_id: keep if _n == 1

Last edited by Jonas Kristensen; 05 Feb 2019, 10:24.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#19

05 Feb 2019, 10:44

Well, yes, many of the fractures are lost when using -by person_id: keep if _n == 1-, but you are only losing fractures that occurred before the first stroke or were second or later post-stroke fractures. Reviewing the code (copied from #9):

Code:

sort person_id stroke_date by person_id (stroke_date): gen start_follow_date = stroke_date[1] by person_id (admission_date), sort: egen first_post_stroke_fx_date = /// min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .)) by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .)) by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato) by person_id: egen had_fracture = max(has_fracture_now) gen dead = 1 replace dead = 0 if missing(death_date) by person_id: egen died = max(dead) by person_id: keep if _n == 1

The third command identifies the date of the first post-stroke fracture, and the subsequent one identifies the diagnosis code of that stroke. These variables will be the same in all observations for a given person_id, so when we keep only the first of these, none of the information we need is lost. The same is true of the variable had_fracture.

But I notice that the variable had_fracture is indicates whether or not a person ever had a fracture (which means it could be a fracture before stroke, or a second or later fracture after stroke.) So it seems that we erred when we later use had_fracture as the outcome in our analysis, because that counts all fractures ever, not just just first fractures following a stroke. The outcome should really be -gen outcome = !missing(first_post_stroke_fx_date)-.

So the problem is not that you are "losing" fractures, the problem is that the -had_fracture- variable is counting fractures that should not be included in your analysis because they occur at the wrong time.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#20

05 Feb 2019, 10:59

Yes! I really think that this is it. Thank you so much. When i use the your advised outcome - "gen outcome = !missing(first_post_stroke_fx_date)"
i get 16.268 failures, which correlates very well with the 16.273 1's from first_poststroke_fx-date and the 16,273 diag-codes for fractures.

If this is it, I maybe have my end results, which makes me very happy. Is it possible that you could explain "gen outcome = !missing(first_post_stroke_fx_date)" to me? Does it mean that first_post_stroke_fx_date is the outcome/failure variable, excluding all that are missing from first_post_stroke_fx_date?

And Is there any smart way to be sure of this or double check it? I get 16.268 failures, where five fractures are missing (16.273-16.268). And when when estimating incidence rates seperately by using:

Code:

// INCIDENCE-RATE OF CATEGORICAL FRACTURE-CODE stset endpoint, failure(diagnosiscode = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 / 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45) scale(365250) origin(start_follow_date) id(person_id) stptime

And all of the other categorical fracture codes, i get 16.256 failures in total, where seventeen fractures are missing ( 16.273-16.256).
Could this small amount of missing fractures be explained somehow by death, deathdates, endpoint coming before, because non of these patients are missing the outcome-variable or first_post_stroke_fx_date
Attached Files

Last edited by Jonas Kristensen; 05 Feb 2019, 11:41.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#21

05 Feb 2019, 12:43

Is it possible that you could explain "gen outcome = !missing(first_post_stroke_fx_date)" to me? Does it mean that first_post_stroke_fx_date is the outcome/failure variable, excluding all that are missing from first_post_stroke_fx_date?

No, the date is not the outcome variable. Rather the outcome variable is whether or not there is such a date. It is just a simple way of identifying those fractures that occur after a stroke and before any other fracture--the way first_post_stroke_fx_date was calculated, it is set to missing whenever no such fracture is found. It would also be missing in cases where there is such a fracture but for some reason the date variable for that record is missing--I don't know if there are any such cases in your registry data. If there are, they might also account for some of the small discrepancies you note.

Could this small amount of missing fractures be explained somehow by death, deathdates, endpoint coming before, because non of these patients are missing the outcome-variable or first_post_stroke_fx_date

Perhaps. I really can't say without access to the data.

I don't understand where the 16,273 number is coming from. Your description of it is not clear to me. But let's say that there is some variable x which identifies these 16,273 observations. That is, let's say x is 1 in exactly these 16,273 observations and 0 in all other observations. If you run:

Code:

browse if _d != x

you will see all of the observations that -stset- counts as a failure, but not your method x, as well as any other observations that perhaps your method x counts as a failure but -stset- does not. You can then try to figure out what it is about those observations that accounts for the discrepancy.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#22

05 Feb 2019, 14:38

Thank you again for your support Mr. Shechter! I really appreciate your guidance.

Would it make sense to detirmine if somebody died before stroke (because of error in the register) with the following command:

Code:

gen test = end_follow_date- start_follow-date if _st==0 sum test sum test if test==0

I get the following results from these tests

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#23

05 Feb 2019, 15:41

Yes.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#24

06 Feb 2019, 09:20

Would you also agree that the registration of the 85 patients who have a negative value in the variable "test", would then have to have been an error, because they cannot die before having a their stroke. And the remaining 841 patients with the value 0 must assumably have died due to their stroke, as their death date is the same as their stroke date?

And just to be sure, in this list of commands:

Code:

sort person_id stroke_date by person_id (stroke_date): gen start_follow_date = stroke_date[1] by person_id (admission_date), sort: egen first_post_stroke_fx_date = /// min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .)) by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .)) by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato) by person_id: egen had_fracture = max(has_fracture_now) gen dead = 1 replace dead = 0 if missing(death_date) by person_id: egen died = max(dead) by person_id: keep if _n == 1

It is ensured that it is only the very first stroke date that is included, as the first ever event of stroke. And it is done by the command:

Code:

by person_id (stroke_date): gen start_follow_date = stroke_date[1]

Last edited by Jonas Kristensen; 06 Feb 2019, 09:44.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#25

06 Feb 2019, 11:53

Would you also agree that the registration of the 85 patients who have a negative value in the variable "test", would then have to have been an error, because they cannot die before having a their stroke. And the remaining 841 patients with the value 0 must assumably have died due to their stroke, as their death date is the same as their stroke date?

Yes, a negative value in test means that the reported death date precedes the reported stroke date--so at least one of those variables must be an error.

It is ensured that it is only the very first stroke date that is included, as the first ever event of stroke. And it is done by the command:

Correct.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#26

06 Feb 2019, 14:20

Thank you for the help!

I am investigating the association between stroke severity and fall-related fractures and civil status and fall-related fractures. When the patients were registered in the stroke registry, the demographic variables are often dichotomous, where for example stroke severity is; very severe, severe, moderate, mild and then "unknown". For civil status, it's; living together, living alone, other/retirement home and then "unknown".
My question relates to what the logical thing is to do epidemiologically in regards to the patients who has been registered as "unknown". For stroke severity, there are 9.979 patients in "unknown" and for civil status, there are 3.459 patients in "unknown".

Excluding these patients would just have too great an impact on the study population, as all of the variables that are adjusted for also contains the option "unknown" in the registry, but explaining that 9.979 patients (8,6%) of a total 116.519 patients simply are unaccounted for in terms of stroke severity seems quite dramatic.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#27

06 Feb 2019, 15:42

This is a difficult problem to which there is no simple solution. Nor is it feasible to summarize all the situations and approaches in a short post. At https://pdfs.semanticscholar.org/58d...c218e126e4.pdf , Paul Allison gives an overview of some of the commonly used approaches and their pros and cons. Reading that would be a good starting point for how to think about this and what some of your options might be.
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#28

06 Feb 2019, 15:58

Thank you again Mr. Schechter!
Comment
Jonas Kristensen

Join Date: Jan 2019

Posts: 41
#29

11 Feb 2019, 14:35

Hello again
I was was hoping that you could help me in regards to estimating patients mean time at risk until fracture for specific groups.

I know that i can use:

Code:

stdescribe

to get the mean time at risk until fracture with 95%CI, but is there a smart way to estimate the mean time at risk for example only for women or only for a specific fracture-diagnosis?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#30

11 Feb 2019, 15:25

Yes. Change the -stset- specifications and then re-run -stdescribe-. -stset- includes an -if(exp)- option [not to be confused with adding -if whatever- to an analytic command itself] that will allow you to look at subpopupulations. And changing the -failure()- option will allow you to look at specific diagnoses.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment