Advice regarding estimating competing risk

Clyde Schechter replied

19 Feb 2019, 15:26
I think your reasoning makes sense. Following a second stroke, the severity of the first stroke is probably no longer very predictive about the risk of a fracture--the severity of the later stroke becomes more salient. So censoring at the second stroke makes sense to me.

For civil status, as long as the civil status remains unchanged, it seems reasonable to leave them uncensored at second stroke. You would, however, want to censor them at the time of any change in civil status.

Another approach to this is to use multiple records per patient and use time-varying covariates in your Cox model. This, too, however, entails the assumption that the effect of stroke severity on fracture risk after a second stroke is the same as the effect of the same level of stroke severity after the first stroke. I don't have enough intuition about this to say if that assumption is credible or not. I can think of arguments why the same level of stroke severity might have a different effect after the second stroke, but I can also think of arguments why it might be the same. If you decide to explore this approach, you might want to discuss that with some experts in physical medicine & rehabilitation or geriatrics.
Leave a comment:
Jonas Kristensen replied

19 Feb 2019, 15:06
Thank you Mr. Schechter
I have had some discussion regarding when and if it makes sense to censure patients when they have a second stroke.
I am estimating the incidence rate of fractures after stroke, and then i am using cox-regression to analyse the riskfactors, stroke severity and civil status.

The initital reasoning behind it, is that a patients risk of fracture would change drastically when having the second stroke, which would have an unwanted effect on the estimates. But some of the observationtime would also be lost if these patients are censured (approx.40 1000-person years out of 440 1000-person years).
It is also an option to only censure patients after second stroke when doing a cox-regression analysis for stroke severity, as the patients stroke severity most likely is'nt the same after the second stroke. Then for the analysis of civil status and the estimation of incidence rate would not censure after second stroke.

I am personally leaning to censuring after second stroke period, and then sacrificing the 40 1000-person years in risk time, but then being able to definitively have an article analysing patients from first ever stroke to fracture.
Leave a comment:
Clyde Schechter replied

13 Feb 2019, 13:13
I do agree. You should be seeing fewer failures and less person time at risk when you terminate observation at the second stroke. However, the impact on the incidence rate cannot be predicted: logically, it could increase, decrease, or stay the same.
Leave a comment:
Jonas Kristensen replied

13 Feb 2019, 11:51
I seem to have figured out what the issue was. When using the code as written in post #33, the "second_stroke_date" variable was generated as the first stroke date for each patient, so every patient got a date under "second_stroke_date". But as i have written before, I am merging several data sets, so i tried generating the "second_stroke_date" variabel before merging, which resulted in the "second_stroke_date" missing variables for all who had not had a second stroke, and then only the second stroke date for those who had a second stroke, which seems correct.

I then get only 900 = end on or before enther in the stset command, where i got 115145 before.

The results of calculating the incidence rate of fractures has then given rise to some new quetions, because the risk time has gone down, the amount of failures up and the IR has gone up.

Old results without risk-time second stroke included: Person time = 441 635 failures = 16268 Incidence rate = 36.84
Results with risk-time until second stroke included : Person time = 409 892 failures = 16555 Incidence rate = 40.39

It seems logical to me that the risk time would go down, because some of the patients who had time until frature, now has time until second stroke, which came before their initial fracture. But the fact that the failures and the IR increase does not seem logical to me. Do you agree?
Leave a comment:
Clyde Schechter replied

13 Feb 2019, 09:50
Well, the code looks correct. And you have verified that the second stroke dates are correct. But I agree the results are not sensible. In particular, what has changed from what you were doing before is that you know consider people censored as of the date of their second stroke (if they have a second one). Other than that, everything is as it was before. Now, this should result in more censored observations and fewer failures than before. But the additional endpoints derive from second strokes, and by definition, they cannot occur before the first stroke (which is the point at which people enter the analysis).

Without seeing the data, I don't know how to troubleshoot this. The best I can suggest is

Code:

browse if _st != 1 | _t < _t0

which will show you the observations that are being excluded and perhaps by looking at those you will be able to see where things are going wrong.
Leave a comment:

Jonas Kristensen replied

13 Feb 2019, 08:45

Thank you!
I have tried setting up the commands like so:

Code:

sort person_id stroke_date
by person_id (stroke_date): gen second_stroke_date = stroke_date[2]
by person_id (stroke_date): gen start_follow_date = stroke_date[1]
by person_id (admission_date), sort: egen first_post_stroke_fx_date = ///
min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .))
by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .))
by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato, second_stroke_date)

gen dead = 1
replace dead = 0 if missing(death_date)
by person_id: egen died = max(dead)
by person_id: keep if _n == 1

gen outcome = !missing(first_post_stroke_fx_date)
gen endpoint = min(first_post_stroke_fx_date, end_follow_date)
stset endpoint, failure(outcome = 1) scale(365.25) origin(start_follow_date)

But it seems like something goes wrong as as 115.145 observations end on or before enter (see attachment).
I have checked that the second stroke date is generated correctly, which is the case.

Attached Files

Leave a comment:

Clyde Schechter replied

11 Feb 2019, 15:25
Yes. Change the -stset- specifications and then re-run -stdescribe-. -stset- includes an -if(exp)- option [not to be confused with adding -if whatever- to an analytic command itself] that will allow you to look at subpopupulations. And changing the -failure()- option will allow you to look at specific diagnoses.
Leave a comment:
Jonas Kristensen replied

11 Feb 2019, 14:35
Hello again
I was was hoping that you could help me in regards to estimating patients mean time at risk until fracture for specific groups.

I know that i can use:

Code:

stdescribe

to get the mean time at risk until fracture with 95%CI, but is there a smart way to estimate the mean time at risk for example only for women or only for a specific fracture-diagnosis?
Leave a comment:
Jonas Kristensen replied

06 Feb 2019, 15:58
Thank you again Mr. Schechter!
Leave a comment:
Clyde Schechter replied

06 Feb 2019, 15:42
This is a difficult problem to which there is no simple solution. Nor is it feasible to summarize all the situations and approaches in a short post. At https://pdfs.semanticscholar.org/58d...c218e126e4.pdf , Paul Allison gives an overview of some of the commonly used approaches and their pros and cons. Reading that would be a good starting point for how to think about this and what some of your options might be.
Leave a comment:
Jonas Kristensen replied

06 Feb 2019, 14:20
Thank you for the help!

I am investigating the association between stroke severity and fall-related fractures and civil status and fall-related fractures. When the patients were registered in the stroke registry, the demographic variables are often dichotomous, where for example stroke severity is; very severe, severe, moderate, mild and then "unknown". For civil status, it's; living together, living alone, other/retirement home and then "unknown".
My question relates to what the logical thing is to do epidemiologically in regards to the patients who has been registered as "unknown". For stroke severity, there are 9.979 patients in "unknown" and for civil status, there are 3.459 patients in "unknown".

Excluding these patients would just have too great an impact on the study population, as all of the variables that are adjusted for also contains the option "unknown" in the registry, but explaining that 9.979 patients (8,6%) of a total 116.519 patients simply are unaccounted for in terms of stroke severity seems quite dramatic.
Leave a comment:
Clyde Schechter replied

06 Feb 2019, 11:53
Would you also agree that the registration of the 85 patients who have a negative value in the variable "test", would then have to have been an error, because they cannot die before having a their stroke. And the remaining 841 patients with the value 0 must assumably have died due to their stroke, as their death date is the same as their stroke date?

Yes, a negative value in test means that the reported death date precedes the reported stroke date--so at least one of those variables must be an error.

It is ensured that it is only the very first stroke date that is included, as the first ever event of stroke. And it is done by the command:

Correct.
Leave a comment:

Jonas Kristensen replied

06 Feb 2019, 09:20

Would you also agree that the registration of the 85 patients who have a negative value in the variable "test", would then have to have been an error, because they cannot die before having a their stroke. And the remaining 841 patients with the value 0 must assumably have died due to their stroke, as their death date is the same as their stroke date?

And just to be sure, in this list of commands:

Code:

sort person_id stroke_date
by person_id (stroke_date): gen start_follow_date = stroke_date[1]
by person_id (admission_date), sort: egen first_post_stroke_fx_date = ///
min(cond(has_fracture_now & admission_date > start_follow_date, admission_date, .))
by person_id: egen diagnosiscode = min(cond(admissiondate == first_post_stroke_fx_date, diag, .))
by person_id (admission_date): gen end_follow_date = min(td(31dec2017), dødsdato)
by person_id: egen had_fracture = max(has_fracture_now)
gen dead = 1
replace dead = 0 if missing(death_date)
by person_id: egen died = max(dead)
by person_id: keep if _n == 1

It is ensured that it is only the very first stroke date that is included, as the first ever event of stroke. And it is done by the command:

Code:

by person_id (stroke_date): gen start_follow_date = stroke_date[1]

Last edited by Jonas Kristensen; 06 Feb 2019, 09:44.

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment: