Dear Listers
This is perhaps a more theoretical question than a directly stata related question, but there is a question about code in there as well.
The problem: In a study of survival i have two cohorts (exposed and unexposed). Observation starts at the time X, and ends at the time Z. Time Z is defined as end of obs or time of death.
At some point in time between X and Z the exposed become exposed (at time X+Ndays). I want to calculate the HR for death between the exposed and the unexposed. Exposure in this is specific treatment.
I have, however, introduced an immortal time of Ndays to the exposed group as i condition them on a future event - you have to be alive at the time of exposure and the unexposed don't. This gives a HR above 1.00 when comparing the unexposed to the exposed.
Well i know that the exposed are unexposed until they in fact become exposed. So i thought i might split my data on exposure, saying that the exposed are unexposed until they get exposed. That, on the other hand, only moves the immortal Ndays over to the unexposed group and i get a HR (when comparing the unexposed to the exposed) closer to 1.00 or even lower than 1.00.
I have tried a conditional landmarking approach, saying that we start the observations time at a set point in time and define who is and who is not exposed at this time and compare these groups. This did not change much, i leave out about half of my cohort that later than 1year after start of observation become exposed - i loose power.
So i thought, what if i just disregard the Ndays - but in both cohorts. I match my cohorts on what ever co-variates i would normally put in my Cox model (gender, age at start, charlston comorbidity for example) and dropping all of the unexposed that are not alive at the time their matched exposed counterpart gets exposed - and than run the Cox model.
My questions are:
1) Is this a feasible way to go about it?
2) What about those exposed/unexposed that cannot be matched with the other cohort
3) could I use the Ndays until exposure in the model?
thought i might put it in as a continuous variable looking at the Hazard increase by each day in the Ndays - thus being able to say something about the effect of prolonging time to exposure.
But the Ndays should it than be:
or the exposed= days from start obs until the exposure date
for the unexposed=days from start obs until end obs OR days from start obs until the matched exposed counterpart gets exposed?
4) How would i go about matching the two cohorts and how do i figure out if the unexposed have died prior to the matched exposed counterpart gets exposed.
I provide you with a mock dataset (the original dataset has 4500 exposed and 4200 unexposed)
ids are unique
case indicates if you are exposed or unexposed
cci is the charlson comorbidity
Hope my questions are somewhat understandable.
Lars
This is perhaps a more theoretical question than a directly stata related question, but there is a question about code in there as well.
The problem: In a study of survival i have two cohorts (exposed and unexposed). Observation starts at the time X, and ends at the time Z. Time Z is defined as end of obs or time of death.
At some point in time between X and Z the exposed become exposed (at time X+Ndays). I want to calculate the HR for death between the exposed and the unexposed. Exposure in this is specific treatment.
I have, however, introduced an immortal time of Ndays to the exposed group as i condition them on a future event - you have to be alive at the time of exposure and the unexposed don't. This gives a HR above 1.00 when comparing the unexposed to the exposed.
Well i know that the exposed are unexposed until they in fact become exposed. So i thought i might split my data on exposure, saying that the exposed are unexposed until they get exposed. That, on the other hand, only moves the immortal Ndays over to the unexposed group and i get a HR (when comparing the unexposed to the exposed) closer to 1.00 or even lower than 1.00.
I have tried a conditional landmarking approach, saying that we start the observations time at a set point in time and define who is and who is not exposed at this time and compare these groups. This did not change much, i leave out about half of my cohort that later than 1year after start of observation become exposed - i loose power.
So i thought, what if i just disregard the Ndays - but in both cohorts. I match my cohorts on what ever co-variates i would normally put in my Cox model (gender, age at start, charlston comorbidity for example) and dropping all of the unexposed that are not alive at the time their matched exposed counterpart gets exposed - and than run the Cox model.
My questions are:
1) Is this a feasible way to go about it?
2) What about those exposed/unexposed that cannot be matched with the other cohort
3) could I use the Ndays until exposure in the model?
thought i might put it in as a continuous variable looking at the Hazard increase by each day in the Ndays - thus being able to say something about the effect of prolonging time to exposure.
But the Ndays should it than be:
or the exposed= days from start obs until the exposure date
for the unexposed=days from start obs until end obs OR days from start obs until the matched exposed counterpart gets exposed?
4) How would i go about matching the two cohorts and how do i figure out if the unexposed have died prior to the matched exposed counterpart gets exposed.
I provide you with a mock dataset (the original dataset has 4500 exposed and 4200 unexposed)
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(id case gender cci dead start_of_obs expo_date end_of_obs age_start_obs) 1 0 0 0 0 18518 . 19358 76 2 1 1 2 1 17358 17601 17843 43 3 1 0 0 1 16025 16652 16670 70 4 1 1 1 0 19323 19316 19358 58 5 0 0 3 0 18609 . 19358 41 6 1 0 0 0 19268 19141 19358 31 7 0 1 0 1 10757 . 11119 73 8 1 1 1 0 18745 18995 19358 40 9 0 0 1 1 11153 . 11650 34 10 0 0 3 0 19342 . 19358 67 11 1 1 2 0 18815 19215 19358 65 12 0 1 3 0 19312 . 19358 80 13 1 0 1 0 18881 19010 19358 64 14 0 0 2 1 18407 . 19088 43 15 0 0 1 1 11627 . 12352 68 16 0 0 1 0 18619 . 19358 32 17 0 1 2 1 10231 . 10920 40 18 1 0 3 0 18975 18979 19358 29 19 1 0 0 1 14682 15114 15478 63 20 1 0 0 0 18899 19262 19358 37 21 1 0 0 0 18481 19150 19358 79 22 0 0 3 1 12897 . 13429 57 23 0 1 3 1 11887 . 12346 27 24 1 0 2 1 12792 13318 13568 47 25 1 1 0 1 7435 7880 7954 20 end format %td start_of_obs format %td expo_date format %td end_of_obs
case indicates if you are exposed or unexposed
cci is the charlson comorbidity
Hope my questions are somewhat understandable.
Lars
Comment