Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survival Time Data - Long Form - How do I generate a variable that indicates the number of observations at risk within a subgroup of data?

    I have a dataset on legal cases and individual hearings for each case, across multiple courts. I have set it up as a survival time dataset in order to estimate the factors that contribute to case disposition time, using a Cox PH Model. I want to control for the workload of each court, since a higher workload will undoubtedly affect the speed of disposal. I have a dummy, dispvar, as the failure variable, and the time in years, duration_year, as the time variable.

    Each individual observation is a a single hearing, identified by a date stored in datetime format:


    Code:
    * Example generated by -dataex-.    To install: ssc install dataex
    clear
    input float(unidcode judge_code)    double(businessondate_sif datefiled_sif)    float(lasthearingdate_sif    newcourt    oldtreatmentcourt    afternc    casetype_code    dispvar    duration_year)
    351040 1 16469 14845 16469 0 0 0    42 1  4.449315
    350965 1 17616 17429 17616 0 0 0    42 1 .51232874
    342692 1 17723 17555 17723 0 0 0    36 1   .460274
    350915 1 17835 17695 17835 0 0 0    42 1  .3835616
    269514 1 17927 17739 21348 0 0 0    28 0  .5150685
    342498 1 18592 17468 18592 0 0 0    36 1  3.079452
    247489 1 18913 18637 18913 0 0 0    25 1  .7561644
    344091 1 19170 18253 19170 0 0 0    36 1  2.512329
    265946 1 19227 18744 19227 0 0 0    28 1 1.3232877
    265459 1 19419 18331 19419 0 0 0    28 1  2.980822
    end
    format %td businessondate_sif
    format %td datefiled_sif
    format %td lasthearingdate_sif

    How do I create a variable that will enable me to to control for the number of legal cases pending (which have not yet been disposed of) on a given hearing date?

    Thank you in advance to anybody who can help me with this.



  • #2
    I'm sorry if I'm being dense, but you have a lot of date variables and it isn't clear to me what they mean. Let me be specific. If I gave you a particular date, say 15 December 2009, how would you be able to tell from the dates in this data set which cases are still active on that date and which are not? Does one of these variables indicate when the case started and another indicate when it ends? If so, which variables are those? If there are no such variables, then how would you answer my question about 15 December 2009? What would you do?

    Comment


    • #3
      Hi Clyde, thanks for your response.

      The variable businessondate_sif contains the date of the hearing.
      The variable datefiled_sif contains the date that the case began, and the variable lasthearingdate_sif contains the date at which observations are censored, irrespective of whether they have failed or not. They would be considered active at any date in between. Any case with datefiled_sif < 15 December 2009 and lasthearingdate_sif >= 15 December 2009 would be considered active on that date. Ones with lasthearingdate_sif < 15 December 2009 or with datefiled_sif > 15 December 2009 would not be considered active on that date.

      Comment


      • #4
        Thank you. Then what you want is:

        Code:
        //    MAKE A COPY OF THE DATA
        tempfile copy
        save `copy'
        
        rangejoin lasthearingdate_sif businessondate_sif . using `copy', by(judge_code)
        keep if datefiled_sif_U < businessondate_sif
        by unidcode, sort: egen active_caseload = count(unidcode_U)
        by unidcode: keep if _n == 1
        drop *_U
        Note: -rangejoin- is written by Robert Picard and is available from SSC. It also requires that you have -rangestat- installed. The latter is by Robert Picard, Nick Cox, and Roberto Ferrer, and is also available from SSC.

        Comment


        • #5
          Thanks so much for the prompt reply, I'm very grateful for your help!

          Comment

          Working...
          X