Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Discrete time survival analysis with left truncation/censoring

    Hello

    I am trying to estimate what covariates (one example would be change in government) affect the likelihood of a government agency being terminated.

    My dataset is set up as follows:
    ID Y variable (binary) Year X variable x2 variable
    1 (agency 1) 0 1996 0 50
    1 0 1997 0 54
    1 1 1998 1 60
    2 (agency 2) 0 1996 0 57
    2 0 1997 0 43
    2 0 1998 0 32
    2 1 1999 1 67
    As fair as i have gathered by reading material supplied by Stephen Jenkins i should use a standard logit function, or a cloglog function. I am however in doubt as to in what way i should account for the temporal dependence in the model (as the agencies are observed at multible time points) as far as i can gather the standard way is include either time dummy variables or make an assumption about the hazard function.

    The problem is that most of the agencies i study have existed before the observational period and have therefore been at risk before observing them (i observe the agencies between year 1996 and 2020, but a lot of them are either created before 1996 or enter the study/are created after 1996) and i am not quite sure if and how i should account for this left truncation/left censoring (i know these are not interchangeable but i have a hard time differing between them/people seem to mix them up).

    One idea i had was simply to use their real age as a way to account for time dependency by including a variable for this in the dataset and the creating a dummy variable for each age and including this in my estimated model (Logit yvar xvar i.age). I am however not quite sure if this is the right way to go about it. An alternative would be to only focus on agencies created after 1996.

    I hope somebody can be of guidance.
    Last edited by sladmin; 14 Jun 2023, 09:46. Reason: anonymize original poster

  • #2
    Do you know the year that each agency began? If so, no problem -- you have left truncation ('delayed entry'). My survival analysis materials advise how to set up your dataset to model the data in this case. The time-at-risk variable will be an integer, indexing years since agency began. If an agency started in 1995, then elapsed duration = 2 for 1996, 3 for 1997, and so on. If an agency started in 1990, then elapsed duration = 7 for 1996, 8 for 1997, and so on. If an agency started in 1998, then elapsed duration = 1 for 1998, 2 for 1999, and so on.

    For the regressions you do not use the agency-years of data corresponding to 1995 and earlier. For an agency starting after 1996, you use all of its agency-year observations

    Please re-read the Survival Analysis manuscript about this. Do not confuse (a) the creation/definition of the elapsed duration variable and calendar time, and (b) which agency-year observations to use. They are separate, albeit related, issues.

    If you don't know the year when an agency started you are in left-censored territory and it's harder to proceed. (How could you calculate the elapsed duration variable if you don't know when an agency began?)

    Comment


    • #3
      Thank you very much. I do know when the agencies were created

      I must have missed it in the manuscript, is this the right page? https://www.iser.essex.ac.uk/resourc...sis-with-stata

      Comment

      Working...
      X