Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subtracting dates from different rows in a dataset

    Hi All,

    I am working with a dataset, where my idea is to determine the reasons and predictors for why patients' stay at the long-term rehab facility is interrupted. In order to find out which patients have gaps between the end date of their last hospital stay and the start date of the most subsequent one, I want to make sure I create a code that captures patients whose begin date - end date > 0. The challenge is that dates I want to subtract from each other belong to different rows and are under separate entries (aka. hospital stays). This dataset has over 300k entries, so I would like to find an efficient way to find these individuals and then collapse the dataset so that I have one row for each patient with a number of days they were out of the facility between the very first begin date and the very last end date. Here is the table from the dataset that explains it.



    px_id Facility Code Begin date End Date
    1 656 03/03/15 06/03/15
    1 656 06/19/15 09/20/15
    1 656 11/20/15 02/21/16
    2 123 01/02/16 01/20/16
    2 123 01/27/16 02/18/16


    Thank you!

  • #2
    Subtracting entries across observations (not rows!!) is entirely possible. What you need to make sure is that your dates are numerical date variables recognized by Stata. Using dataex to present a data example following the advice in FAQ Advice #12 would have made this clear. The replace command in the second line distributes the value 1 across all observations of px_id if at least one observation satisfies the condition. Note that the following considers time gaps within the same facility. If you want gaps not considering facility, delete facilitycode from the first line of the code.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte px_id int facilitycode float(begin end)
    1 656 20150 20242
    1 656 20258 20351
    1 656 20412 20505
    2 123 20455 20473
    2 123 20480 20502
    end
    format %tdN/D/Y begin
    format %tdN/D/Y end
    
    bys px_id facilitycode (begin end): gen gap= begin-end[_n-1] > 0 & _n>1
    bys px_id (gap): replace gap= gap[_N]
    Res.:

    Code:
    . l, sepby(px_id)
    
         +----------------------------------------------+
         | px_id   facili~e      begin        end   gap |
         |----------------------------------------------|
      1. |     1        656   03/03/15   06/03/15     1 |
      2. |     1        656   06/19/15   09/20/15     1 |
      3. |     1        656   11/20/15   02/21/16     1 |
         |----------------------------------------------|
      4. |     2        123   01/02/16   01/20/16     1 |
      5. |     2        123   01/27/16   02/18/16     1 |
         +----------------------------------------------+
    Last edited by Andrew Musau; 07 Aug 2022, 11:00.

    Comment


    • #3
      Thank you! This worked out very well.

      One additional question I had is as follows: I have noticed that there are patients who are assigned to more than once facility and I want to make sure I determine who these people are. How would I identify those px_id that have been to more than one facility?

      Thank you again.

      Comment


      • #4
        Code:
        bys px_id (facilitycode): gen count = sum(facilitycode!=facilitycode[_n-1])
        by px_id: replace count= count[_N]
        tabdisp px_id, cell(count)
        Last edited by Andrew Musau; 08 Aug 2022, 10:58.

        Comment

        Working...
        X