Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • working with dates and reshape by dates

    Dear listers

    I have a data set where i have a list of dates indicating an event.

    A mock dataset could look like this, var3 is just for explanation

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id date) str40 var3
    1 19960 "This is the start date of episode 1 id 1"
    1 20054 "This is the end date of episode 1 id 1"  
    1 20453 "This is the start date of episode 2 id 1"
    1 20483 ""                                        
    1 20512 ""                                        
    1 20605 ""                                        
    1 20759 ""                                        
    1 21125 ""                                        
    1 21157 ""                                        
    2 18993 ""                                        
    2 19025 ""                                        
    2 19056 ""                                        
    2 19118 ""                                        
    2 19524 ""                                        
    2 19797 ""                                        
    2 20315 ""                                        
    2 20393 ""                                        
    2 20437 ""                                        
    end
    format %td date
    I want to do more than one thing and think i need to do this stepwise.
    1) To count new episodes. So from the start date (lowest date by id) a new episode is defined as there is more than 200 days between two dates:
    Code:
    bysort id: gen new_episode=1 if date-date[_n-1]>200
    2) I want to identify the first episode pr id
    Code:
    bysort id: gen first_episode=1 if _n==1
    3) indicate that the first episode is not a returning event but the first one
    Code:
    replace new_episode=. if first_episode==1
    4) generate a date that indicates the debut date
    Code:
    gen debute_date=date if first_episode==1
    format debute_date %td
    5) generate a date for each new episode
    Code:
    generate new_episode_date=date if new_episode==1
    formate new_episode_date %td
    6) generate a date that indicates the end of an episode
    Code:
    bysort id: generate end_episode_date=date[_n-1]+100 if new_episode==1
    format end_episode_date %td
    But this is where i get lost.

    So firstly, there must be a much smarter way of doing this?
    And secondly, I really want to know how many days each episode lasts, and how many days there is between episodes ect.
    Do i need to reshape or is it possible to get the same info in long format?

    Thank you for reading all the way to the end.

    Best,
    Lars

  • #2
    I'm a little confused by some of your terminology, and I may have missed a few things along the way. But take a look at this:

    Code:
    by id (date), sort: gen episode = sum(date > date[_n-1] + 200) + 1
    by id (date): gen debut_date = date[1]
    by id episode (date), sort: gen is_start_date = (_n == 1)
    by id episode (date): gen episode_start_date = date[1]
    by id episode (date): gen episode_end_date = date[_N]
    format episode_start_date episode_end_date debut_date %td
    by id episode (date): gen duration = date[_N] - date[1] + 1
    by id (episode date): gen prior = episode_start_date[_n-1]
    by id (episode date): gen latency = episode_start_date - prior if is_start_date
    by id episode (date): replace latency = latency[_n-1] if _n > 1
    drop prior
    This creates the following variables:

    debut_date is the first date an id appears in the data set.
    episode -- a sequential number starting from 1, and incrementing whenever a gap of 200 days between dates is found. First episodes are characterized by episode == 1.
    is_start_date -- a 0/1 variable indicating whether the current observation is the first date of a new episode.
    episode_start_date -- the starting date of the current episode
    episode_end_date -- the ending date of the current episode
    duration -- the length of the episode in days, including both the start and end date
    latency -- the number of days between the start date of the preceding episode and the start date of the current episode. (Missing value for first episodes.)

    If there is something I have missed, hopefully, you can easily calculate it from these.

    Comment

    Working...
    X