Cox proportional hazard assumption - issue with different interpretations

Laura Hill

Join Date: Aug 2021
Posts: 42

Cox proportional hazard assumption - issue with different interpretations

14 Sep 2023, 07:03

Dear all,

I have a question regarding a Cox Proportional Hazards Analysis, and more specifically the proportional hazards assumption and the interpretation.
Side note: I am using STATA MP/15 at work and have STATA 18/SE at home

I have a large dataset (> 250K firms in CP format) that looks at firm failure on a quarterly basis (up to 40 quarters). My independent variables are team size, gender & nationality diversity (using Blau index, minimum 0 and max 1 for nationality and 0.5 for gender as it is a dummy var) and age diversity (measured using coefficient of variation). These variables can vary over the quarters due to changes in the team composition.
When checking for the proportional hazard assumption I used the following methods:

Schoenfeld residuals using estat phtest, detail (side note the industry & country are my control variables using dummies)

Code:

      ----------------------------------------------------------------
                  |       rho            chi2       df       Prob>chi2
      ------------+---------------------------------------------------
      1.countryc~2|      0.02250        72.46        1         0.0000
      2.countryc~2|     -0.00361         1.92        1         0.1655
      4.countryc~2|      0.00286         1.23        1         0.2675
      5.countryc~2|      0.01147        17.51        1         0.0000
      6.countryc~2|      0.01066        16.62        1         0.0000
      7.countryc~2|     -0.00676         6.78        1         0.0092
      8b.countr~t2|            .            .        1             .
      1.industry~2|     -0.06753       611.13        1         0.0000
      2.industry~2|     -0.06685       582.30        1         0.0000
      4.industry~2|     -0.07532       727.20        1         0.0000
      5b.industr~2|            .            .        1             .
      TMTsize     |      0.01306        25.74        1         0.0000
      BlauGender  |      0.05834       516.33        1         0.0000
      VariationAge|      0.00189         0.55        1         0.4595
      BlauNation~y|     -0.00200         0.58        1         0.4454
      ------------+---------------------------------------------------
      global test |                   1693.43       13         0.0000
      ----------------------------------------------------------------

Here the assumption is violated based on significance for Gender & Size. But as read in other sources this may be caused by the large sample size.
Using tvc on these covariates also are signficant, further showing violation of the assumption.

Code:

Cox regression -- Breslow method for ties

No. of subjects      =      272,653             Number of obs    =   5,043,835
No. of failures      =      211,759
Time at risk         =      6757574
                                                Wald chi2(13)    =    12296.65
Log pseudolikelihood =   -1783341.2             Prob > chi2      =      0.0000

                              (Std. Err. adjusted for 246,784 clusters in BvdIdNumber)
--------------------------------------------------------------------------------------
                     |               Robust
                  _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
main                 |
         countrycat2 |
            Finland  |   .8091496   .0310387    -5.52   0.000     .7505454    .8723297
              Italy  |   2.439466    .078066    27.87   0.000     2.291159    2.597373
            Romania  |    .289436    .015444   -23.24   0.000     .2606954    .3213451
 Russian Federation  |   1.324375   .0570622     6.52   0.000     1.217128    1.441073
        Switzerland  |   1.395914   .0454626    10.24   0.000     1.309593    1.487924
     United Kingdom  |   2.468952   .0722197    30.90   0.000     2.331386    2.614637
                     |
        industrycat2 |
  C - Manufacturing  |   2.200137   .0495874    34.99   0.000     2.105063    2.299505
M - Professional,..  |   2.222844   .0464667    38.21   0.000     2.133612    2.315809
J - Information a..  |   2.700483   .0545709    49.16   0.000     2.595616    2.809586
---------------------+----------------------------------------------------------------
tvc                  |
             TMTsize |   .9992306   .0001653    -4.65   0.000     .9989065    .9995547
          BlauGender |   .9817832   .0004862   -37.12   0.000     .9808307    .9827367
        VariationAge |   .9951163   .0010039    -4.85   0.000     .9931507    .9970858
     BlauNationality |   1.019324   .0006307    30.93   0.000     1.018088    1.020561
--------------------------------------------------------------------------------------

However, when plotting the variables using estat phtest, plot(VARIABLES) yline(0), I visually have (quite) horizontal lines (for all covariates similar results, therefore not all pics are posted). Also changing the bandwidth has little effect. Not sure as of when you would visually say the assumption is not met.

But when dividing the sample into subsamples based on quarters (eg. firms in Y0-2 (table 1) vs Y2-4 (table 2)) I see clear differences in the hazard ratio's. It differs even more when comparing year 0-2 with year 8-10.

Code:

Cox regression with Breslow method for ties

 

No. of subjects =   226,587                          Number of obs = 1,975,546
No. of failures =    25,492
Time at risk    = 1,975,546
                                                     Wald chi2(13) =   3738.62
Log pseudolikelihood = -307650.59                    Prob > chi2   =    0.0000

 

                                                                (Std. err. adjusted for 226,587 clusters in BvdIdNumber)
------------------------------------------------------------------------------------------------------------------------
                                                       |               Robust
                                                    _t | Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
-------------------------------------------------------+----------------------------------------------------------------
                                          industrycat2 |
                                    C - Manufacturing  |   10.45771   1.487319    16.50   0.000     7.913645    13.81963
M - Professional, scientific and technical activities  |   9.003804   1.269628    15.59   0.000     6.829642    11.87009
                    J - Information and communication  |   12.30777   1.723383    17.93   0.000     9.353859    16.19452
                                                       |
                                           countrycat2 |
                                              Finland  |   .6878698   .1086084    -2.37   0.018     .5047883    .9373531
                                                Italy  |   2.819074   .3243733     9.01   0.000     2.249904     3.53223
                                              Romania  |   .0988451   .0330192    -6.93   0.000     .0513584    .1902386
                                   Russian Federation  |   .4502098   .1378809    -2.61   0.009     .2470168    .8205467
                                          Switzerland  |   1.388524   .1677931     2.72   0.007       1.0957    1.759604
                                       United Kingdom  |   4.049271   .4326389    13.09   0.000     3.284213    4.992549
                                                       |
                                               TMTsize |   .9584146   .0086527    -4.70   0.000     .9416047    .9755246
                                            BlauGender |   .3379189   .0089028   -41.18   0.000     .3209125    .3558265
                                          VariationAge |   .7572294   .0380637    -5.53   0.000     .6861833    .8356316
                                       BlauNationality |    1.80255   .0552501    19.22   0.000     1.697451    1.914157
------------------------------------------------------------------------------------------------------------------------

Code:

Cox regression with Breslow method for ties

 

No. of subjects =   189,827                          Number of obs = 1,481,065
No. of failures =    60,585
Time at risk    = 1,481,065
                                                     Wald chi2(13) =   5373.72
Log pseudolikelihood = -718002.85                    Prob > chi2   =    0.0000

 

                                                                (Std. err. adjusted for 189,827 clusters in BvdIdNumber)
------------------------------------------------------------------------------------------------------------------------
                                                       |               Robust
                                                    _t | Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
-------------------------------------------------------+----------------------------------------------------------------
                                          industrycat2 |
                                    C - Manufacturing  |   5.223083   .3122031    27.66   0.000      4.64566    5.872275
M - Professional, scientific and technical activities  |   5.496831   .3200672    29.27   0.000     4.903983    6.161349
                    J - Information and communication  |   6.445158   .3710698    32.36   0.000     5.757408    7.215064
                                                       |
                                           countrycat2 |
                                              Finland  |   .4826577   .0377151    -9.32   0.000     .4141198    .5625389
                                                Italy  |   2.658978   .1430748    18.17   0.000     2.392837     2.95472
                                              Romania  |   .3838614   .0327841   -11.21   0.000     .3246957    .4538082
                                   Russian Federation  |   1.025024   .0888774     0.29   0.776     .8648256    1.214898
                                          Switzerland  |   1.237009   .0697365     3.77   0.000     1.107609    1.381527
                                       United Kingdom  |   2.429062   .1207806    17.85   0.000     2.203506    2.677707
                                                       |
                                               TMTsize |   .9487184   .0059191    -8.44   0.000     .9371878    .9603909
                                            BlauGender |   .5693154   .0096876   -33.10   0.000     .5506411     .588623
                                          VariationAge |   .9751457    .032161    -0.76   0.445     .9141053    1.040262
                                       BlauNationality |   1.507205   .0313781    19.71   0.000     1.446943    1.569977
------------------------------------------------------------------------------------------------------------------------

So now I am confused as to how to interpret these results. Specifically:
1) Can I conclude the assumption is violated or not?
2) Why would one method of checking for the assumption be more valid/correct in my case, than another?

Any advice would be very welcome!
I have also asked similar question on Cross Validated (link1; link2), but have at time of writing not received an answer on this "issue".

Thanks in advance!
Best regards,
Laura

Last edited by Laura Hill; 14 Sep 2023, 07:34.

Tags: hazard rates, panel data, stcox, survival analysis

Announcement

Cox proportional hazard assumption - issue with different interpretations