Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cox proportional hazard assumption - issue with different interpretations

    Dear all,

    I have a question regarding a Cox Proportional Hazards Analysis, and more specifically the proportional hazards assumption and the interpretation.
    Side note: I am using STATA MP/15 at work and have STATA 18/SE at home

    I have a large dataset (> 250K firms in CP format) that looks at firm failure on a quarterly basis (up to 40 quarters). My independent variables are team size, gender & nationality diversity (using Blau index, minimum 0 and max 1 for nationality and 0.5 for gender as it is a dummy var) and age diversity (measured using coefficient of variation). These variables can vary over the quarters due to changes in the team composition.
    When checking for the proportional hazard assumption I used the following methods:

    Schoenfeld residuals using estat phtest, detail (side note the industry & country are my control variables using dummies)

    Code:
          ----------------------------------------------------------------
                      |       rho            chi2       df       Prob>chi2
          ------------+---------------------------------------------------
          1.countryc~2|      0.02250        72.46        1         0.0000
          2.countryc~2|     -0.00361         1.92        1         0.1655
          4.countryc~2|      0.00286         1.23        1         0.2675
          5.countryc~2|      0.01147        17.51        1         0.0000
          6.countryc~2|      0.01066        16.62        1         0.0000
          7.countryc~2|     -0.00676         6.78        1         0.0092
          8b.countr~t2|            .            .        1             .
          1.industry~2|     -0.06753       611.13        1         0.0000
          2.industry~2|     -0.06685       582.30        1         0.0000
          4.industry~2|     -0.07532       727.20        1         0.0000
          5b.industr~2|            .            .        1             .
          TMTsize     |      0.01306        25.74        1         0.0000
          BlauGender  |      0.05834       516.33        1         0.0000
          VariationAge|      0.00189         0.55        1         0.4595
          BlauNation~y|     -0.00200         0.58        1         0.4454
          ------------+---------------------------------------------------
          global test |                   1693.43       13         0.0000
          ----------------------------------------------------------------
    Here the assumption is violated based on significance for Gender & Size. But as read in other sources this may be caused by the large sample size.
    Using tvc on these covariates also are signficant, further showing violation of the assumption.

    Code:
    Cox regression -- Breslow method for ties
    
    No. of subjects      =      272,653             Number of obs    =   5,043,835
    No. of failures      =      211,759
    Time at risk         =      6757574
                                                    Wald chi2(13)    =    12296.65
    Log pseudolikelihood =   -1783341.2             Prob > chi2      =      0.0000
    
                                  (Std. Err. adjusted for 246,784 clusters in BvdIdNumber)
    --------------------------------------------------------------------------------------
                         |               Robust
                      _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    ---------------------+----------------------------------------------------------------
    main                 |
             countrycat2 |
                Finland  |   .8091496   .0310387    -5.52   0.000     .7505454    .8723297
                  Italy  |   2.439466    .078066    27.87   0.000     2.291159    2.597373
                Romania  |    .289436    .015444   -23.24   0.000     .2606954    .3213451
     Russian Federation  |   1.324375   .0570622     6.52   0.000     1.217128    1.441073
            Switzerland  |   1.395914   .0454626    10.24   0.000     1.309593    1.487924
         United Kingdom  |   2.468952   .0722197    30.90   0.000     2.331386    2.614637
                         |
            industrycat2 |
      C - Manufacturing  |   2.200137   .0495874    34.99   0.000     2.105063    2.299505
    M - Professional,..  |   2.222844   .0464667    38.21   0.000     2.133612    2.315809
    J - Information a..  |   2.700483   .0545709    49.16   0.000     2.595616    2.809586
    ---------------------+----------------------------------------------------------------
    tvc                  |
                 TMTsize |   .9992306   .0001653    -4.65   0.000     .9989065    .9995547
              BlauGender |   .9817832   .0004862   -37.12   0.000     .9808307    .9827367
            VariationAge |   .9951163   .0010039    -4.85   0.000     .9931507    .9970858
         BlauNationality |   1.019324   .0006307    30.93   0.000     1.018088    1.020561
    --------------------------------------------------------------------------------------
    However, when plotting the variables using estat phtest, plot(VARIABLES) yline(0), I visually have (quite) horizontal lines (for all covariates similar results, therefore not all pics are posted). Also changing the bandwidth has little effect. Not sure as of when you would visually say the assumption is not met.



    But when dividing the sample into subsamples based on quarters (eg. firms in Y0-2 (table 1) vs Y2-4 (table 2)) I see clear differences in the hazard ratio's. It differs even more when comparing year 0-2 with year 8-10.

    Code:
    Cox regression with Breslow method for ties
    
     
    
    No. of subjects =   226,587                          Number of obs = 1,975,546
    No. of failures =    25,492
    Time at risk    = 1,975,546
                                                         Wald chi2(13) =   3738.62
    Log pseudolikelihood = -307650.59                    Prob > chi2   =    0.0000
    
     
    
                                                                    (Std. err. adjusted for 226,587 clusters in BvdIdNumber)
    ------------------------------------------------------------------------------------------------------------------------
                                                           |               Robust
                                                        _t | Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
    -------------------------------------------------------+----------------------------------------------------------------
                                              industrycat2 |
                                        C - Manufacturing  |   10.45771   1.487319    16.50   0.000     7.913645    13.81963
    M - Professional, scientific and technical activities  |   9.003804   1.269628    15.59   0.000     6.829642    11.87009
                        J - Information and communication  |   12.30777   1.723383    17.93   0.000     9.353859    16.19452
                                                           |
                                               countrycat2 |
                                                  Finland  |   .6878698   .1086084    -2.37   0.018     .5047883    .9373531
                                                    Italy  |   2.819074   .3243733     9.01   0.000     2.249904     3.53223
                                                  Romania  |   .0988451   .0330192    -6.93   0.000     .0513584    .1902386
                                       Russian Federation  |   .4502098   .1378809    -2.61   0.009     .2470168    .8205467
                                              Switzerland  |   1.388524   .1677931     2.72   0.007       1.0957    1.759604
                                           United Kingdom  |   4.049271   .4326389    13.09   0.000     3.284213    4.992549
                                                           |
                                                   TMTsize |   .9584146   .0086527    -4.70   0.000     .9416047    .9755246
                                                BlauGender |   .3379189   .0089028   -41.18   0.000     .3209125    .3558265
                                              VariationAge |   .7572294   .0380637    -5.53   0.000     .6861833    .8356316
                                           BlauNationality |    1.80255   .0552501    19.22   0.000     1.697451    1.914157
    ------------------------------------------------------------------------------------------------------------------------
    Code:
    Cox regression with Breslow method for ties
    
     
    
    No. of subjects =   189,827                          Number of obs = 1,481,065
    No. of failures =    60,585
    Time at risk    = 1,481,065
                                                         Wald chi2(13) =   5373.72
    Log pseudolikelihood = -718002.85                    Prob > chi2   =    0.0000
    
     
    
                                                                    (Std. err. adjusted for 189,827 clusters in BvdIdNumber)
    ------------------------------------------------------------------------------------------------------------------------
                                                           |               Robust
                                                        _t | Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]
    -------------------------------------------------------+----------------------------------------------------------------
                                              industrycat2 |
                                        C - Manufacturing  |   5.223083   .3122031    27.66   0.000      4.64566    5.872275
    M - Professional, scientific and technical activities  |   5.496831   .3200672    29.27   0.000     4.903983    6.161349
                        J - Information and communication  |   6.445158   .3710698    32.36   0.000     5.757408    7.215064
                                                           |
                                               countrycat2 |
                                                  Finland  |   .4826577   .0377151    -9.32   0.000     .4141198    .5625389
                                                    Italy  |   2.658978   .1430748    18.17   0.000     2.392837     2.95472
                                                  Romania  |   .3838614   .0327841   -11.21   0.000     .3246957    .4538082
                                       Russian Federation  |   1.025024   .0888774     0.29   0.776     .8648256    1.214898
                                              Switzerland  |   1.237009   .0697365     3.77   0.000     1.107609    1.381527
                                           United Kingdom  |   2.429062   .1207806    17.85   0.000     2.203506    2.677707
                                                           |
                                                   TMTsize |   .9487184   .0059191    -8.44   0.000     .9371878    .9603909
                                                BlauGender |   .5693154   .0096876   -33.10   0.000     .5506411     .588623
                                              VariationAge |   .9751457    .032161    -0.76   0.445     .9141053    1.040262
                                           BlauNationality |   1.507205   .0313781    19.71   0.000     1.446943    1.569977
    ------------------------------------------------------------------------------------------------------------------------
    So now I am confused as to how to interpret these results. Specifically:
    1) Can I conclude the assumption is violated or not?
    2) Why would one method of checking for the assumption be more valid/correct in my case, than another?


    Any advice would be very welcome!
    I have also asked similar question on Cross Validated (link1; link2), but have at time of writing not received an answer on this "issue".

    Thanks in advance!
    Best regards,
    Laura
    Last edited by Laura Hill; 14 Sep 2023, 07:34.
Working...
X