Discrete time survival analysis with left truncation/censoring

Guest
#1

Discrete time survival analysis with left truncation/censoring

07 Mar 2021, 11:28

Hello

I am trying to estimate what covariates (one example would be change in government) affect the likelihood of a government agency being terminated.

My dataset is set up as follows:
ID Y variable (binary) Year X variable x2 variable

1 (agency 1) 0 1996 0 50

1 0 1997 0 54

1 1 1998 1 60

2 (agency 2) 0 1996 0 57

2 0 1997 0 43

2 0 1998 0 32

2 1 1999 1 67

As fair as i have gathered by reading material supplied by Stephen Jenkins i should use a standard logit function, or a cloglog function. I am however in doubt as to in what way i should account for the temporal dependence in the model (as the agencies are observed at multible time points) as far as i can gather the standard way is include either time dummy variables or make an assumption about the hazard function.

The problem is that most of the agencies i study have existed before the observational period and have therefore been at risk before observing them (i observe the agencies between year 1996 and 2020, but a lot of them are either created before 1996 or enter the study/are created after 1996) and i am not quite sure if and how i should account for this left truncation/left censoring (i know these are not interchangeable but i have a hard time differing between them/people seem to mix them up).

One idea i had was simply to use their real age as a way to account for time dependency by including a variable for this in the dataset and the creating a dummy variable for each age and including this in my estimated model (Logit yvar xvar i.age). I am however not quite sure if this is the right way to go about it. An alternative would be to only focus on agencies created after 1996.

I hope somebody can be of guidance.

Last edited by sladmin; 14 Jun 2023, 09:46. Reason: anonymize original poster
Tags: logit, panel data, Suggestion, survival analysis, Time Series
Stephen Jenkins

Join Date: Apr 2014

Posts: 1433
#2

07 Mar 2021, 14:16

Do you know the year that each agency began? If so, no problem -- you have left truncation ('delayed entry'). My survival analysis materials advise how to set up your dataset to model the data in this case. The time-at-risk variable will be an integer, indexing years since agency began. If an agency started in 1995, then elapsed duration = 2 for 1996, 3 for 1997, and so on. If an agency started in 1990, then elapsed duration = 7 for 1996, 8 for 1997, and so on. If an agency started in 1998, then elapsed duration = 1 for 1998, 2 for 1999, and so on.

For the regressions you do not use the agency-years of data corresponding to 1995 and earlier. For an agency starting after 1996, you use all of its agency-year observations

Please re-read the Survival Analysis manuscript about this. Do not confuse (a) the creation/definition of the elapsed duration variable and calendar time, and (b) which agency-year observations to use. They are separate, albeit related, issues.

If you don't know the year when an agency started you are in left-censored territory and it's harder to proceed. (How could you calculate the elapsed duration variable if you don't know when an agency began?)
Comment
Guest
#3

08 Mar 2021, 12:00

Thank you very much. I do know when the agencies were created

I must have missed it in the manuscript, is this the right page? https://www.iser.essex.ac.uk/resourc...sis-with-stata
Comment

ID	Y variable (binary)	Year	X variable	x2 variable
1 (agency 1)	0	1996	0	50
1	0	1997	0	54
1	1	1998	1	60
2 (agency 2)	0	1996	0	57
2	0	1997	0	43
2	0	1998	0	32
2	1	1999	1	67

Announcement

Discrete time survival analysis with left truncation/censoring

Comment

Comment