Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help with splitting longitudinal data into episodes of equal length

    Hi Statalisters,

    I'd appreciate your help with a data formatting problem I'm trying to solve.

    I'm working with longitudinal data (multiple rows per subject). I want to split the data into intervals of equal length (30 days), starting from the study entry date until the study exit date, taking into account the multiple rows per subject.

    My data looks like this:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(myid DoE DoX dob admidate adm_ind)
    1 18263 23375 5395 18281 1
    1 18263 23375 5395 19511 1
    1 18263 23375 5395 19554 1
    1 18263 23375 5395 19762 1
    1 18263 23375 5395 19764 1
    1 18263 23375 5395 20049 1
    1 18263 23375 5395 20251 1
    1 18263 23375 5395 21682 1
    end
    format %d DoE
    format %d DoX
    format %d dob
    format %d admidate




    DoE is the date of entry into the study
    DoX the date of exit
    dob is the date of birth
    admidate is the date of admission

    Each individual can have multiple admissions. I'd like to split the data so that an admission flag appears in any time interval where an admission occurred.

    Could someone please advise on the best way to do this?

    Any help would be much appreciated.

    Thanks.

  • #2
    Code:
    tempfile admissions
    save `admissions'
    
    keep myid DoE DoX
    duplicates drop
    gen `c(obs_t)' obs_no = _n
    expand ceil((DoX - DoE)/30)
    by obs_no, sort: gen start = DoE + (_n-1)*30
    gen end = start + 29
    format start end %d
    drop obs_no
    
    rangejoin admidate start end using `admissions', by(myid) keepusing(adm_ind)
    replace adm_ind = 0 if missing(adm_ind)
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Comment


    • #3
      Thanks Clyde! Much appreciated

      Comment


      • #4
        Here is an alternative, which uses only Stata's inbuilt commands (I borrow some chunks of code from #2):
        Code:
        tempfile admissions
        save `admissions'
        
        keep myid DoE DoX
        duplicates drop
        
        gen `c(obs_t)' obs_no = _n
        expand (DoX - DoE)
        bysort myid (obs_no): gen admidate = DoE + _n - 1
        
        merge 1:1 myid admidate using `admissions', keep(master match)
        
        gen period = ceil((admidate + 1 - DoE)/30)
        collapse (max) _merge , by(myid DoE DoX period)
        gen byte wanted = (_merge == 3)
        
        gen start = DoE + (period - 1)*30
        gen end = start + 29
        format start end %d
        drop _merge period
        Note that internally, rangejoin uses the same expand and merge technique, so this method should be as fast (may even be faster insofar as it avoids some of the other overhead of the command that is unnecessary here).
        Last edited by Hemanshu Kumar; Yesterday, 01:37.

        Comment

        Working...
        X