Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Violation of proportional hazard assumption with big sample size

    Dear all,

    as part of my PhD I am looking at a sample of about 95,000 firms in Switzerland and I want to find out if having an external auditor (such as PwC, KPMG, EY) reduces the likelihood that a firm goes bankrupt. In Switzerland, smaller companies can choose if they want to have their accounts audited or not, i.e. they can 'opt-out' from having an external auditor.
    So far I have looked at this mainly by using logistic regression models with bankruptcy (0/1) being the dependent variable. After discussion with my supervisor, I would like to corroborate my results with survival analysis techniques.

    I have come up with the following basic Cox proportional hazard model:
    stcox bOptingOut lncapital i.firmCanton i.industryCode, nolog

    - bOptingOut is basically my treatment/non-treatment variable (1 = opting-out, i.e. no auditor; 0 = no opting-out, i.e. financial statements are audited)
    - lncapital is the natural logarithm of the firm's paid up capital in Swiss Francs (corresponds practically to USD)
    - firmCanton is an indicator variable to control for the cantont/state in which the firm is domiciled
    - industryCode is an indicator variable to control for industry effects

    In principle, I was quite happy with the model as the coefficients are highly significant and the direction of the effects appears reasonable.

    Code:
    . stcox i.bOptingOut lncapital i.firmCanton i.industryCode, nolog
    
             failure _d:  event == 3
       analysis time _t:  (date1-origin)
                 origin:  event==1
      enter on or after:  event==1 time td(01jan2008)
      exit on or before:  event==3 time td(31dec2018)
                     id:  id
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =       94,319                  Number of obs    =     161,050
    No. of failures =        9,683
    Time at risk    =    299931461
                                                    LR chi2(44)      =     4529.30
    Log likelihood  =   -106232.09                  Prob > chi2      =      0.0000
    
    ------------------------------------------------------------------------------
              _t | Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    1.bOptingOut |   1.425763   .0464468    10.89   0.000     1.337574    1.519766
       lncapital |   .8598573   .0107152   -12.12   0.000     .8391103    .8811172
    Then I turned to to testing the proportional hazards assumption. First, I did some graphical checks and plotted the hazard and survival functions as well as stphplot and stcoxkm. IMHO, things do look quite okay (see picture below).
    Click image for larger version

Name:	propHazardTest.png
Views:	1
Size:	67.3 KB
ID:	1489572



    Afterwards, I performed the test based on Schoenfeld residuals (estat phtest) and get a Prob>chi2 value of 0.0000.
    Based on this, the proportional hazards assumption is clearly violated.

    Currently, I have two main questions:
    a) Does my data set really violate the proportional hazard (PH) assumption? Or is this simply due to the big sample size I have?

    b) If the PH assumption is really violated, is it okay to use a lognormal parametric model instead [streg ..., dist(lognormal)]?
    Based on AIC/BIC that would be the best of the parametric models. Or would you suggest anything else?

    I am using the book "An Introduction to Survival Analysis Using Stata" (Revised 3rd Edition) by Cleves, Gould and Marchenko.

    As you can see, this is my first post to the Statalist forum. In case you need further information, just let me know.
    I would really appreciate your help!

    Kind regards,
    Daniel
    Last edited by Daniel Baettig; 22 Mar 2019, 11:28.
Working...
X