Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Closest date to another date variable (both dates are varying) in a panel

    Hi,

    I have a dataset of individuals completing a series of tasks over time. The columns of interest are user_id, task_id, task_start_date, and task_completion_date
    The panel is set using user_id

    In each observation, I am trying to find the closest task_completion_date in the panel prior to the task_start_date.

    The observations are sorted by user_id and task_start_date

    The closest question I encountered on Statalist is https://www.statalist.org/forums/for...-date-variable
    but in that question, one of the dates is fixed in a panel. In my case, both the dates are varying.

    Code:
    input long(user_id task_id) float(task_completion_date    task_start_date)
    1 145600 19145 19117
    1 248937 19376 19369
    1 285722 19444 19402
    1 423274 19596 19472
    1 385689 19542 19507
    1 524506 19691 19628
    1 537442 19701 19642
    1 594828 19774 19749
    1 841061 19911 19876
    2  14951 18639 18545
    2  12787 18619 18574
    2  15126 18642 18589
    2  15647 18648 18646
    2  22675 18705 18686
    2 477632 19657 19630
    2 528701 19694 19642
    end
    format %td task_completion_date
    format %td task_start_date
    Any inputs are highly appreciated. Thank you in advance.

  • #2
    The best strategy for problems that requires taking two dates into account like this is to convert the dates from a wide to a long layout and then order observations by date. You can search the forum for overlapping hospital stays questions and get further insight from:

    SJ-13-1 dm0068 . . . . . Stata tip 114: Expand paired dates to pairs of dates
    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
    Q1/13 SJ 13(1):217--219 (no commands)
    tip on using expand to deal with paired dates

    http://www.stata-journal.com/article...article=dm0068
    The following solution places completion events after start events on any given date. If you carry over completion dates, the carried over completion date will necessarily be from a prior date for all event start observations.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long(user_id task_id) float(task_start_date task_completion_date)
    1 145600 19117 19145
    1 248937 19369 19376
    1 285722 19402 19444
    1 423274 19472 19596
    1 385689 19507 19542
    1 524506 19628 19691
    1 537442 19642 19701
    1 594828 19749 19774
    1 841061 19876 19911
    2  14951 18545 18639
    2  12787 18574 18619
    2  15126 18589 18642
    2  15647 18646 18648
    2  22675 18686 18705
    2 477632 19630 19657
    2 528701 19642 19694
    end
    format %td task_start_date
    format %td task_completion_date
    
    * observations are uniquely identified by 
    isid user_id task_id
    
    * convert dates to a long layout; create an identifier for the completion event
    expand 2
    bysort user_id task_id: gen date = cond(_n == 1, task_start_date, task_completion_date)
    format %td date
    by user_id task_id: gen completion = cond(_n == 1, 0, 1)
    
    * reorder observations by date, put event completions after event starts
    isid user_id date completion task_id, sort
    
    * carryover the completion dates
    gen priorcomp = date if completion
    format %td priorcomp
    by user_id: replace priorcomp = priorcomp[_n-1] if mi(priorcomp)
    
    * return to original wide layout
    keep if !completion
    drop date completion
    sort user_id task_start_date
    
    list, sepby(user_id)
    And here are the final results:
    Code:
    . list, sepby(user_id)
    
         +-------------------------------------------------------+
         | user_id   task_id   task_st~e   task_co~e   priorcomp |
         |-------------------------------------------------------|
      1. |       1    145600   04may2012   01jun2012           . |
      2. |       1    248937   11jan2013   18jan2013   01jun2012 |
      3. |       1    285722   13feb2013   27mar2013   18jan2013 |
      4. |       1    423274   24apr2013   26aug2013   27mar2013 |
      5. |       1    385689   29may2013   03jul2013   27mar2013 |
      6. |       1    524506   27sep2013   29nov2013   26aug2013 |
      7. |       1    537442   11oct2013   09dec2013   26aug2013 |
      8. |       1    594828   26jan2014   20feb2014   09dec2013 |
      9. |       1    841061   02jun2014   07jul2014   20feb2014 |
         |-------------------------------------------------------|
     10. |       2     14951   10oct2010   12jan2011           . |
     11. |       2     12787   08nov2010   23dec2010           . |
     12. |       2     15126   23nov2010   15jan2011           . |
     13. |       2     15647   19jan2011   21jan2011   15jan2011 |
     14. |       2     22675   28feb2011   19mar2011   21jan2011 |
     15. |       2    477632   29sep2013   26oct2013   19mar2011 |
     16. |       2    528701   11oct2013   02dec2013   19mar2011 |
         +-------------------------------------------------------+
    
    .

    Comment


    • #3
      That worked perfectly!
      Thank you so much for a very thorough explanation, Robert.

      Comment

      Working...
      X