Question about Survival Analysis (Discrete hazard proportional odd)

Omar Shaher

Join Date: Feb 2019

Posts: 164
#1

Question about Survival Analysis (Discrete hazard proportional odd)

24 Feb 2019, 07:39

Dear statalist,
First of all, I would like to thank you all for the valuable information you are providing, and which is actually contributing to our knowledge.
I have 3 questions please and I hope that it will be answered.

I am using unbalanced panel data for a set of companies, and I want to apply the (Discrete hazard proportional odds model).
In 2005 set of standards have been issued and therefore companies were becoming under risk, since 2005, to adopt these standards as the adoption is voluntary. However, due to the lack of financial reports of many companies in 2005, and most of the annual reports appeared in 2006, I decided that the period of the study will be from 2006 to 2015. If I started since 2005 then all companies that established in 2006 will be considered as a (left-truncated) (i.e. any newly established company after 2005 will be under risk to adopt “the subjects have been at risk before entering the study”). And I want to minimize the left-truncated observation as much as possible because they should be excluded when conducting the unobserved heterogeneity “Frailty”. However, they will not be excluded from the sample when conducting the discrete hazard model without Frailty. Therefore, the study period will be from 2006-2015.
My Questions:
Based on the above case, does my understanding of the meaning of left-truncated observation is correct?

Regarding the calendar time, companies calendar time’s will be coded (1) since 2006 and (2) in 2007 and (3) in 2008, and (4) in 2009 and so on until they experience the event. However, for left-truncated companies, they will be coded based on the year they issued. For instance, if a company issued in 2008 then the code for the calendar time for this company this year will be started from (3) not (1). Am I, right?

I have deleted all the left-truncated observation to conduct the “Frailty” and the results showed me that the probability of the likelihood is not significant. However, I did run the analysis with them and the results showed me that the probability of the likelihood is not significant as well. So, that means the unobserved heterogeneity is negligible in my case??

I am very, very sorry for the length of the questions, but I hope that it will be answered, and I will be very grateful for you.
Million thanks in advance.
Tags: None
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#2

25 Feb 2019, 02:10

What is the outcome variable? What is the event that ends the spell "at risk"? And, turning things round, when does a firm start being at risk of that event?

If I understand correctly, companies start being at risk of experiencing the event (whatever that is) from 2005 onwards -- that's the year the standards legislation. You have 2 types of firm in your data: (i) those that were already in existence in 2005 (or before); and (ii) those that started business in 2006. Your dataset observes firm from 2006 only. Group (ii) are exposed to standards legislation from their foundation, and you observe the start of the spell. For group (i), there is left truncation. Spells start in 2005 (duration year "1"), but you only observe them from year "2" onwards. This can be handled in the standard way. (See e.g. my Survival Analysis webpages.)

What you must not do is fit a frailty model to left-truncated data unless you have written a special program to fit the model. Frailty means that there is a form of dynamic sample selection, and the standard way of fitting models to left-truncated data does not address this problem. [Note in this respect that the same problem arises with continuous time models: you cannot fit frailty models to left-trucated (a.k.a. 'delayed entry') data using streg -- see the Reference Manual.]
1 like
Comment
Omar Shaher

Join Date: Feb 2019

Posts: 164
#3

25 Feb 2019, 07:25

Dear Prof. Jenkins,
First and Foremost, I would like to thank you from the bottom of my heart for your reply and for the precious and valuable webpages, I have started reading your materials since last August, and I have benefited so much. I am so grateful to you.

Yes, the outcome variable is a binary variable (adopted the standard or not), and the event that ends the spell is (Adoption), and firms start being at risk in 2005.

The thing is, I can observe the dataset since 2005 for those companies that were already in existence in 2005. Therefore, I can start my sample period since 2005 and the calendar time will be coded (1) for 2005 and (2) for 2006 and so on. On that way, there will be no left-truncated observation. Most importantly, any firm founded let’s say in the middle of the period, it’s calendar time will be coded based on that date (e.g. a firm established in 2008 and I want to include it in the sample, then calendar time will be started from 4 not 1) and this type of companies is not considered as a left-truncated as they did not exist before 2005. Am I right??

I appreciate your precious time but one last question, please, is it logical to conduct a correlation analysis between the event and the covariates for the discrete hazard model??

Appreciating your kind cooperation and I can hardly express my gratitude.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#4

25 Feb 2019, 10:22

You wrote

a firm established in 2008 and I want to include it in the sample, then calendar time will be started from 4 not 1)

I don't understand this. A firm born in 2008 can only be at risk of something from the year 2008 -- the first year at risk of adoption, 2008, is a "1", not a "4"

is it logical to conduct a correlation analysis between the event and the covariates for the discrete hazard model?

I don't understand the question
Comment
Omar Shaher

Join Date: Feb 2019

Posts: 164
#5

25 Feb 2019, 14:33

Great, I got it now regarding firms that born in 2008. Greatly appreciated.

Regarding the above case, so there will be no left-truncation if I can observe the dataset since 2005 (i.e. starting my sample period from 2005). Am I right?

Regarding the correlation analysis, do the inferences from the correlation matrix between the event and the covariates have to match the results from the discrete hazard model (i.e. qualitatively similar in terms of sign and significance level?).

Thank you very much.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#6

26 Feb 2019, 02:04

there will be no left-truncation if I can observe the dataset since 2005 (i.e. starting my sample period from 2005). Am I right?

Well, you still haven't been 100% clear about the nature of time at risk, and precisely what defines (a) the year in which a firm is at risk of experiencing the event of interest and (b) the year of event. (When do firms become "at risk"? What is an "event"?) But (guessing) you're probably on the right track. Over to you to sort out.

Regarding the correlation analysis, do the inferences from the correlation matrix between the event and the covariates have to match the results from the discrete hazard model (i.e. qualitatively similar in terms of sign and significance level?).

I think -- but am not sure -- that you're asking whether simple descriptive statistics from data about the binary event variable and covariates should correspond with "results" from the discrete hazard model. There are all sorts of issues here. Short answer: I wouldn't rely on the former. Find some relevant published papers in your field and look at what they do.
Comment
Omar Shaher

Join Date: Feb 2019

Posts: 164
#7

26 Feb 2019, 07:39

Prof. Jenskins, thank you, very much for that.

But to make it clear 100% in terms of the nature of time at risk:
The year in which a firm is at risk of experiencing the event of interest is (2005) because in this year the standards have been issued so all firm are becoming at risk (i .e. adoption) to experience the event of adoption at this point.

The year of the event (the year of the adoption of these standards), can be any year, might be 2005 or 2006 or 2007… etc.

Therefore, if I start observing the firm’s data since 2005 (the date that companies started to be exposed to risk), then there will be no left-truncated. But, If I start observing the firm’s data since 2006, those in 2005 will be considered as left truncated as they were in risk in 2005 but delayed entering the sample (i.e. they started to be observed since 2006).

After clarifying the issue now, do you think I'm on the right track??

Thank you very much.
Comment
Omar Shaher

Join Date: Feb 2019

Posts: 164
#8

14 Mar 2019, 22:05

Hi Prof. Jenkins,
I am very embarrassed to ask but if I consider the year 2005 is the year were the risk started (firms started to expose to the risk in 2005) and the event is the adoption of standards in any year and the study end in 2018. The main point is that we can not observe the event (adoption) if we don’t have the annual report. So, all firms have the annual reports since 2005 till they experience the event of adoption and all financial data for the independent variables are available as well, so I can observe the event and independent variables since 2005. However, there are 85 firms their financial data regarding the independent variables were available since 2005, while their annual reports were not existing in 2005 so the dependent variable (event) in 2005 cannot be observed but their annual reports began to appear in 2006, while the financial data for the indep.var were existing since 2005.
So, do you think I should consider these firms as left-truncated or I should consider that the firm is fully observed if the annual report is existing to notice the event (dep.var) and the financial data for the independent variable as well. In other words, the firm started it’s business (born) since the date of the first available annual report regardless if it’s had financial data for the indep.var before.
If yes, I have returned back to your precious material (https://www.iser.essex.ac.uk/files/t...s/ec968st3.pdf) and I found that you had set the left-truncated for continuous time parametric model, but in my case, I am going to use the discrete time hazard model and the command (stset) for the continuous-time case, not for my case. Therefore, if I want to consider them as a left truncated, can I use the following steps:
Leaving firms that have all information since 2005 as they are without deleting anything and starting the spell since 2005 and the calendar time will be coded (1) since 2005 (the date where the risk started)

Deleting the year 2005 just for firms which I consider them as a left-truncated and starting the spell in 2006 and coding the calendar time in the year 2006 2 not 1.

Then writing the below code:
encode company name, gen (COMPID)
global id COMPID
lab var calendar “spell year”
lab var event “binary depvar for discrete hazard model”
ta calendar, ge (d)
ds d*
ge e1= calendar<=6
ge e2= calendar>=7 & calendar <=14.
Logit event indep1 indep2 e2, nolog.

I shall always remember you with gratitude
Million thanks in advance.
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1439
#9

15 Mar 2019, 04:36

Sorry, but your statement of your problem is too confusing for me (who has little time to read it in detail). At this stage, I strongly you to go back to "first principles" and work from there. You will have to (more) clearly define the event of interest and also the year in which firms become at risk of experiencing the event (which in turn defines the elapsed duration/time at risk variable). Doing this properly, you should be able to answer your own question about whether you have any left-truncated spells or not.
Comment
Omar Shaher

Join Date: Feb 2019

Posts: 164
#10

15 Mar 2019, 08:48

I am very apologizing if the statement was too long. I am so sorry.
To make it short and clear as much as possible:
1-The year in which a firm is at risk of experiencing the event of interest is (2005).
2- The year of the event (the year of the adoption of the standards), can be any year, might be 2005 or 2006 or 2007.
Therefore, I have 2 types of firms in my data:
those that were already in existence in 2005 and can be observed.

those that were already in existence in 2005 in only can be observed in 2006.

So, group (i) are exposed to standards legislation from their foundation, and I can observe the start of the spell. For group (ii), there is left truncation. Spells start in 2005 (duration year "1"), but I can only observe them from year "2" onwards.
Please, the question is to deal with left-truncated in group (ii), can I delete the year 2005 from the spell and starting from 2006 and the calendar time in the year 2006 starting from 2 not 1. While group (i) the calendar time in the year 2005 starting from 1. Then I write the above-mentioned code.

I repeat my apologies once again.
My deepest respect and appreciation to you
Comment

Announcement

Question about Survival Analysis (Discrete hazard proportional odd)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment