Hello
I am trying to estimate what covariates (one example would be change in government) affect the likelihood of a government agency being terminated.
My dataset is set up as follows:
As fair as i have gathered by reading material supplied by Stephen Jenkins i should use a standard logit function, or a cloglog function. I am however in doubt as to in what way i should account for the temporal dependence in the model (as the agencies are observed at multible time points) as far as i can gather the standard way is include either time dummy variables or make an assumption about the hazard function.
The problem is that most of the agencies i study have existed before the observational period and have therefore been at risk before observing them (i observe the agencies between year 1996 and 2020, but a lot of them are either created before 1996 or enter the study/are created after 1996) and i am not quite sure if and how i should account for this left truncation/left censoring (i know these are not interchangeable but i have a hard time differing between them/people seem to mix them up).
One idea i had was simply to use their real age as a way to account for time dependency by including a variable for this in the dataset and the creating a dummy variable for each age and including this in my estimated model (Logit yvar xvar i.age). I am however not quite sure if this is the right way to go about it. An alternative would be to only focus on agencies created after 1996.
I hope somebody can be of guidance.
I am trying to estimate what covariates (one example would be change in government) affect the likelihood of a government agency being terminated.
My dataset is set up as follows:
ID | Y variable (binary) | Year | X variable | x2 variable |
1 (agency 1) | 0 | 1996 | 0 | 50 |
1 | 0 | 1997 | 0 | 54 |
1 | 1 | 1998 | 1 | 60 |
2 (agency 2) | 0 | 1996 | 0 | 57 |
2 | 0 | 1997 | 0 | 43 |
2 | 0 | 1998 | 0 | 32 |
2 | 1 | 1999 | 1 | 67 |
The problem is that most of the agencies i study have existed before the observational period and have therefore been at risk before observing them (i observe the agencies between year 1996 and 2020, but a lot of them are either created before 1996 or enter the study/are created after 1996) and i am not quite sure if and how i should account for this left truncation/left censoring (i know these are not interchangeable but i have a hard time differing between them/people seem to mix them up).
One idea i had was simply to use their real age as a way to account for time dependency by including a variable for this in the dataset and the creating a dummy variable for each age and including this in my estimated model (Logit yvar xvar i.age). I am however not quite sure if this is the right way to go about it. An alternative would be to only focus on agencies created after 1996.
I hope somebody can be of guidance.
Comment