I am analyzing the survival of a group of respiratory patients. One of my variables is baseline CPI, a composite index of respiratory function (the lower, the better).
data is in this format:
cpi predicts survival in a Cox model, but it violates the proportional hazard assumption:
Looking at the survival by different quintiles of CPI, it seems that the problem is that the effect of lower values (less severe) of baseline CPI is shifted to the right compared to the higher ones, which makes sense
here are the curves of survival and the log-log plot:


Indeed, testing separate periods, the parallel hazard assumption is respected between 2 and 4 years, but not before or after
I could easily stop the analysis at 4 years (it is an interval long enough for a baseline predictor), but can't find a way to adjust the previous period. I tried building interactions with time at different ponts ( _t<1 or _t <2 etc) , as suggested in " An Introduction to Survival Analysis Using Stata" (possibly outdated), but cannot resolve. I also looked to alternatives to Cox regression in "Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model", but although this shouldn't be an uncommon problem, I could not find a clear explanation on which would be the best way to address the problem. I suspect that splines could be used, but, again, I can't find a good tutorial for this case. Maybe the solution is so easy that I can't see it?
Any help would be appreciated
data is in this format:
Code:
* Example generated by -dataex-. For more info, type help dataex
clear
input int id byte sex double age byte smoking float(cpi dead surv_yrs)
372 1 73.8630136986301 1 56.08594 1 4.7342467
373 1 66.8328767123288 0 60.6701 1 2.1150684
374 1 84.4602739726027 0 46.87331 0 1.1315068
375 0 70.5150684931507 1 51.72653 1 1.7041095
376 0 76.227397260274 0 53.59892 1 1.241096
377 0 73.1260273972603 1 49.97387 1 5.493151
378 0 74.6739726027397 0 53.25577 1 2.0164382
379 1 88.2520547945206 1 46.58249 1 4.986301
380 1 78.8027397260274 0 58.7373 1 1.1589041
381 0 81.4054794520548 0 44.01292 0 5.419178
382 1 60.3095890410959 1 63.4892 1 4.446575
383 0 70.3945205479452 0 48.29292 0 4.526027
384 1 85.5260273972603 1 50.03133 0 5.156164
385 0 75.0986301369863 0 57.7133 1 1.6136986
386 1 64.8109589041096 0 69.9197 1 .59178084
387 1 74.3150684931507 1 34.1793 0 3.660274
388 0 67.4931506849315 1 41.70098 1 2.79726
389 1 68.3643835616438 1 63.0056 1 2.832877
390 1 81.3589041095891 1 45.1427 1 3.4
391 0 82.1835616438356 0 59.78088 1 2.1452055
end
label values sex sex
label def sex 0 "F", modify
label def sex 1 "M", modify
Code:
stset surv_yrs, fail(dead) id(id) exit(time 8) stcox cpi sex age, nolog Failure _d: dead Analysis time _t: surv_rs Exit on or before: time 8 ID variable: id Cox regression with Breslow method for ties No. of subjects = 825 Number of obs = 825 No. of failures = 649 Time at risk = 2,802.7699 LR chi2(4) = 341.86 Log likelihood = -3709.1995 Prob > chi2 = 0.0000 _t Haz. ratio Std. err. z P>z [95% conf. interval] cpi | 1.075482 .0047164 16.59 0.000 1.066278 1.084766 sex | 1.07819 .1083775 0.75 0.454 .8853897 1.312975 age | 1.021587 .0054431 4.01 0.000 1.010974 1.032311 estat phtest, detail Test of proportional-hazards assumption Time function: Analysis time rho chi2 df Prob>chi2 cpi -0.30377 63.05 1 0.0000 sex 0.02194 0.31 1 0.5765 age 0.01998 0.29 1 0.5894 Global test 63.43 3 0.0000
here are the curves of survival and the log-log plot:
Code:
xtile q_cpi= cpi, nq(5) stcox i.q_cpi sex age_at if dis==1, nolog _t Haz. ratio Std. err. z P>z [95% conf. interval] q_cpi 2 1.507044 .2330814 2.65 0.008 1.112956 2.040676 3 2.681361 .396617 6.67 0.000 2.006544 3.583124 4 4.550416 .66173 10.42 0.000 3.421903 6.051101 5 7.866322 1.174407 13.82 0.000 5.870717 10.54029 sex 1.05997 .1068369 0.58 0.563 .8699591 1.291481 age 1.022326 .005499 4.11 0.000 1.011605 1.033161 stcurve, survival at(q_cpi==1) at(q_cpi==2) at(q_cpi==3) at(q_cpi==4) at(q_cpi==5) legend(position(6) col(5)) stphplot, by(q_cpi) legend(position(6) col(5))
Indeed, testing separate periods, the parallel hazard assumption is respected between 2 and 4 years, but not before or after
I could easily stop the analysis at 4 years (it is an interval long enough for a baseline predictor), but can't find a way to adjust the previous period. I tried building interactions with time at different ponts ( _t<1 or _t <2 etc) , as suggested in " An Introduction to Survival Analysis Using Stata" (possibly outdated), but cannot resolve. I also looked to alternatives to Cox regression in "Flexible Parametric Survival Analysis Using Stata: Beyond the Cox Model", but although this shouldn't be an uncommon problem, I could not find a clear explanation on which would be the best way to address the problem. I suspect that splines could be used, but, again, I can't find a good tutorial for this case. Maybe the solution is so easy that I can't see it?
Any help would be appreciated