Hi, Stata community,
I'm working with a multiple-record ID dataset using the stcox command. I am working with STATA MP 16. My dataset includes 1M + observations, and a time-varying exposure variable (categorical with 46 levels, indicating drug exposure) and a time variable (time_period ranging from 0 to 140 days). I've set up my survival data using stset as follows:
.
The stcox is as follows
However, the results from stcox are unexpected, with overly protective estimates and wide confidence intervals.
When looking at stptime to calculate incidence rates per exposure variables (stptime, by(exposure), It seems Stata only considers exposure time if _d==1, disregarding exposure status before events in those with multiple records.
I checked the stcox help, and the time-varying variable FAQ but found no mention of this issue for multiple-record IDs.
I also tried the "tvc" option, which improved estimates but resulted in very narrow confidence intervals (e.g., HR=1.233 (1.232;1.234)) :
Any guidance on addressing this problem would be greatly appreciated!
Thank you!
I'm working with a multiple-record ID dataset using the stcox command. I am working with STATA MP 16. My dataset includes 1M + observations, and a time-varying exposure variable (categorical with 46 levels, indicating drug exposure) and a time variable (time_period ranging from 0 to 140 days). I've set up my survival data using stset as follows:
Code:
stset time failure(failure==1), id(id)
ID | time | failure | exposure | _t | _t0 | _d |
1 | 28 | 0 | 0 | 28 | 0 | 0 |
1 | 43 | 0 | 3 | 43 | 28 | 0 |
1 | 140 | 0 | 0 | 140 | 43 | 0 |
2 | 140 | 0 | 0 | 140 | 0 | 0 |
3 | 126 | 0 | 0 | 126 | 0 | 0 |
3 | 137 | 0 | 26 | 137 | 126 | 0 |
3 | 140 | 1 | 0 | 140 | 137 | 1 |
4 | 12 | 0 | 16 | 12 | 0 | 0 |
4 | 120 | 0 | 0 | 120 | 12 | 0 |
4 | 140 | 1 | 20 | 140 | 120 | 1 |
5 | 84 | 1 | 0 | 84 | 0 | 1 |
The stcox is as follows
Code:
stcox i.exposure
However, the results from stcox are unexpected, with overly protective estimates and wide confidence intervals.
When looking at stptime to calculate incidence rates per exposure variables (stptime, by(exposure), It seems Stata only considers exposure time if _d==1, disregarding exposure status before events in those with multiple records.
I checked the stcox help, and the time-varying variable FAQ but found no mention of this issue for multiple-record IDs.
I also tried the "tvc" option, which improved estimates but resulted in very narrow confidence intervals (e.g., HR=1.233 (1.232;1.234)) :
Code:
gen null_var=1
Code:
stcox null_var, tvc(i.exposure)
Any guidance on addressing this problem would be greatly appreciated!
Thank you!