Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying the start of an episode

    Hi

    I have two data sets. Number 1 contains an ID variable and one point in time per row:

    Code:
    clear
    input id date
    1 430
    2 435
    3 560
    4 460
    end
    
    format date %td
    Data set 2 is in long format. It contains the ID variable, a second variable that counts episodes that the ID went through, a start date and an end date of those episodes, and a variable x indicating something arbitrary 0/1 about the nature of the episode:

    Code:
    clear
    input id episode start end x
    1 1 420 429 1
    1 2 430 .   1
    2 1 420 425 0
    2 2 426 433 1
    2 3 434 .   0
    3 1 500 520 0
    3 2 521 540 1
    3 3 521 .     1
    4 1 420 421 1
    4 2 422 423 0
    4 3 424 425 0
    4 4 426 427 1
    4 5 428 465 0
    4 6 466 470 1
    
    end
    
    format start  %td
    format end %td
    How do I merge these two data sets so that one like this is created?

    Code:
    clear
    input id date start x
    1 430 430 1
    2 435 434 0
    3 560 521 1
    4 460 428 0
    end
    I.e. I want to know the start date of the episode that is current at time point "date" and the "x" that goes with the episode.

    Thanks so much for your consideration
    Jo

  • #2
    The simplest way is:
    Code:
    clear
    input id date
    1 430
    2 435
    3 560
    4 460
    end
    tempfile dataset1
    save `dataset1'
    
    format date %td
    
    clear
    input id episode start end x
    1 1 420 429 1
    1 2 430 .   1
    2 1 420 425 0
    2 2 426 433 1
    2 3 434 .   0
    3 1 500 520 0
    3 2 521 540 1
    3 3 521 .     1
    4 1 420 421 1
    4 2 422 423 0
    4 3 424 425 0
    4 4 426 427 1
    4 5 428 465 0
    4 6 466 470 1
    
    end
    
    format start  %td
    format end %td
    tempfile dataset2
    save `dataset2'
    
    use `dataset2', clear
    rangejoin date start end using `dataset1', by(id)
    keep if !missing(date)
    drop end
    sort id date
    format date %td
    -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    Comment

    Working...
    X