Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with a parametic survival analysis

    Hi!

    Sorry, this will probably end long, but I really need som help.

    I am doing a reliability analysis of 200 sub stations (SS). Each SS are considered as a system because the failure data is not on a component level. I only know which SS (in addition to where they are located, type of SS, etc.) that have been interrupted/failed (clock, date, year), but not why the failure has occurred. So I have 200 stations, with more than 15,000 failures during a 10 year period. My task is limited to making a parametic survival analysis, where the highest beta on the SS will be the least "reliable" ones, as they will have the greatest effect on the baseline hazard rate.

    The problem now (after all the other issues with this scope of data) is that the significant covariates with the highest beta are associated with the SS that have absolutely the smallest and shortest amount (around 5 in 10 years) failures of all the SS. It seems like I am getting highest beta on those who are least likely to fail, when I was suppose to get the ones with many and / or long failures.

    To get a beta on each SS, my excel file looks like this (using the TBF):

    Click image for larger version

Name:	Forum excel.png
Views:	1
Size:	49.9 KB
ID:	1493015

    Further along are the 200 SS, and then covariates for year 1, year 2... , month 1, month 2..., SS types and so on. So I have a large number of covariates.

    The binary codes are 0 and 1 for all of them. I have tried to replace some of the binary codes with, for example, [2005-2015] of the year and collected them in one column instead of 10 different ones. But this does not change my problem (of course).

    Im using Weibull PH in the parametric survival model.

    Click image for larger version

Name:	Uten navn.png
Views:	1
Size:	44.5 KB
ID:	1493016


    These are my significante SS (systems here) with the highest beta. System5 has only 5 failures. These failures are all from the last two years (2014 and 2015) of my data sample. So the other 8 years there was no failures. But the year 2015, and one of the months also gets omitted . Is this a significant impact? Still, I dont understand how that SS5 ended up as "unreliable".

    SS with over 100 failures does not even become significant. Does anyone understand my problem, and can understand what I am doing wrong? Surely a lot, but I also have no prior knowledge to STATA or on a reliability analysis at this level.



    Thank you.

  • #2
    There is much that I do not understand in your explanation. But the heart of your question seems to be that you are getting your highest coefficients for the systems that have the fewest and longest delayed failures, whereas you expect the opposite.

    Now, it has been a very long time since I have done one of these parametric survival analyses, but I seem to recall that the Weibull model has two different parameterizations: by hazard ratio (hr) and by accelerated failure time (aft). The two parameterizations are equivalent, but they run in opposite directions. From what you describe, it sounds like you estimated the model for AFT: in that parameterization a high beta means a long delay until failure, but you are interpreting the coefficients as if you estimated it with hr. Check the PDF manuals that come with your Stata installation on this. Run -help streg- and click on the blue link near the top to get to the -streg- section of the manuals. Then read that chapter. I think you will find your answer there. I'm not offering you more specific advice because, as I said, I haven't done one of these in a long time and the commands have changed a bit since then.

    Comment

    Working...
    X