Violation of proportional hazard assumption with big sample size

Daniel Baettig

Join Date: Mar 2019

Posts: 9
#1

Violation of proportional hazard assumption with big sample size

22 Mar 2019, 11:26

Dear all,

as part of my PhD I am looking at a sample of about 95,000 firms in Switzerland and I want to find out if having an external auditor (such as PwC, KPMG, EY) reduces the likelihood that a firm goes bankrupt. In Switzerland, smaller companies can choose if they want to have their accounts audited or not, i.e. they can 'opt-out' from having an external auditor.
So far I have looked at this mainly by using logistic regression models with bankruptcy (0/1) being the dependent variable. After discussion with my supervisor, I would like to corroborate my results with survival analysis techniques.

I have come up with the following basic Cox proportional hazard model:
stcox bOptingOut lncapital i.firmCanton i.industryCode, nolog

- bOptingOut is basically my treatment/non-treatment variable (1 = opting-out, i.e. no auditor; 0 = no opting-out, i.e. financial statements are audited)
- lncapital is the natural logarithm of the firm's paid up capital in Swiss Francs (corresponds practically to USD)
- firmCanton is an indicator variable to control for the cantont/state in which the firm is domiciled
- industryCode is an indicator variable to control for industry effects

In principle, I was quite happy with the model as the coefficients are highly significant and the direction of the effects appears reasonable.

Code:

. stcox i.bOptingOut lncapital i.firmCanton i.industryCode, nolog failure _d: event == 3 analysis time _t: (date1-origin) origin: event==1 enter on or after: event==1 time td(01jan2008) exit on or before: event==3 time td(31dec2018) id: id Cox regression -- Breslow method for ties No. of subjects = 94,319 Number of obs = 161,050 No. of failures = 9,683 Time at risk = 299931461 LR chi2(44) = 4529.30 Log likelihood = -106232.09 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.bOptingOut | 1.425763 .0464468 10.89 0.000 1.337574 1.519766 lncapital | .8598573 .0107152 -12.12 0.000 .8391103 .8811172

Then I turned to to testing the proportional hazards assumption. First, I did some graphical checks and plotted the hazard and survival functions as well as stphplot and stcoxkm. IMHO, things do look quite okay (see picture below).

Afterwards, I performed the test based on Schoenfeld residuals (estat phtest) and get a Prob>chi2 value of 0.0000.
Based on this, the proportional hazards assumption is clearly violated.

Currently, I have two main questions:
a) Does my data set really violate the proportional hazard (PH) assumption? Or is this simply due to the big sample size I have?

b) If the PH assumption is really violated, is it okay to use a lognormal parametric model instead [streg ..., dist(lognormal)]?
Based on AIC/BIC that would be the best of the parametric models. Or would you suggest anything else?

I am using the book "An Introduction to Survival Analysis Using Stata" (Revised 3rd Edition) by Cleves, Gould and Marchenko.

As you can see, this is my first post to the Statalist forum. In case you need further information, just let me know.
I would really appreciate your help!

Kind regards,
Daniel

Last edited by Daniel Baettig; 22 Mar 2019, 11:28.
Tags: None

Announcement

Violation of proportional hazard assumption with big sample size