Survival Time Data - Long Form - How do I generate a variable that indicates the number of observations at risk within a subgroup of data?

Siddharth Rao

Join Date: Aug 2017

Posts: 16
#1

Survival Time Data - Long Form - How do I generate a variable that indicates the number of observations at risk within a subgroup of data?

26 Jan 2019, 08:42

I have a dataset on legal cases and individual hearings for each case, across multiple courts. I have set it up as a survival time dataset in order to estimate the factors that contribute to case disposition time, using a Cox PH Model. I want to control for the workload of each court, since a higher workload will undoubtedly affect the speed of disposal. I have a dummy, dispvar, as the failure variable, and the time in years, duration_year, as the time variable.

Each individual observation is a a single hearing, identified by a date stored in datetime format:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(unidcode judge_code) double(businessondate_sif datefiled_sif) float(lasthearingdate_sif newcourt oldtreatmentcourt afternc casetype_code dispvar duration_year) 351040 1 16469 14845 16469 0 0 0 42 1 4.449315 350965 1 17616 17429 17616 0 0 0 42 1 .51232874 342692 1 17723 17555 17723 0 0 0 36 1 .460274 350915 1 17835 17695 17835 0 0 0 42 1 .3835616 269514 1 17927 17739 21348 0 0 0 28 0 .5150685 342498 1 18592 17468 18592 0 0 0 36 1 3.079452 247489 1 18913 18637 18913 0 0 0 25 1 .7561644 344091 1 19170 18253 19170 0 0 0 36 1 2.512329 265946 1 19227 18744 19227 0 0 0 28 1 1.3232877 265459 1 19419 18331 19419 0 0 0 28 1 2.980822 end format %td businessondate_sif format %td datefiled_sif format %td lasthearingdate_sif

How do I create a variable that will enable me to to control for the number of legal cases pending (which have not yet been disposed of) on a given hearing date?

Thank you in advance to anybody who can help me with this.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#2

26 Jan 2019, 10:12

I'm sorry if I'm being dense, but you have a lot of date variables and it isn't clear to me what they mean. Let me be specific. If I gave you a particular date, say 15 December 2009, how would you be able to tell from the dates in this data set which cases are still active on that date and which are not? Does one of these variables indicate when the case started and another indicate when it ends? If so, which variables are those? If there are no such variables, then how would you answer my question about 15 December 2009? What would you do?
1 like
Comment
Siddharth Rao

Join Date: Aug 2017

Posts: 16
#3

26 Jan 2019, 10:42

Hi Clyde, thanks for your response.

The variable businessondate_sif contains the date of the hearing.
The variable datefiled_sif contains the date that the case began, and the variable lasthearingdate_sif contains the date at which observations are censored, irrespective of whether they have failed or not. They would be considered active at any date in between. Any case with datefiled_sif < 15 December 2009 and lasthearingdate_sif >= 15 December 2009 would be considered active on that date. Ones with lasthearingdate_sif < 15 December 2009 or with datefiled_sif > 15 December 2009 would not be considered active on that date.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30174
#4

26 Jan 2019, 11:05

Thank you. Then what you want is:

Code:

// MAKE A COPY OF THE DATA tempfile copy save `copy' rangejoin lasthearingdate_sif businessondate_sif . using `copy', by(judge_code) keep if datefiled_sif_U < businessondate_sif by unidcode, sort: egen active_caseload = count(unidcode_U) by unidcode: keep if _n == 1 drop *_U

Note: -rangejoin- is written by Robert Picard and is available from SSC. It also requires that you have -rangestat- installed. The latter is by Robert Picard, Nick Cox, and Roberto Ferrer, and is also available from SSC.
Comment
Siddharth Rao

Join Date: Aug 2017

Posts: 16
#5

26 Jan 2019, 11:13

Thanks so much for the prompt reply, I'm very grateful for your help!
Comment

Announcement

Survival Time Data - Long Form - How do I generate a variable that indicates the number of observations at risk within a subgroup of data?

Comment

Comment

Comment

Comment