Subtracting dates from different rows in a dataset

Emile Salerno

Join Date: Aug 2022

Posts: 8
#1

Subtracting dates from different rows in a dataset

07 Aug 2022, 09:34

Hi All,

I am working with a dataset, where my idea is to determine the reasons and predictors for why patients' stay at the long-term rehab facility is interrupted. In order to find out which patients have gaps between the end date of their last hospital stay and the start date of the most subsequent one, I want to make sure I create a code that captures patients whose begin date - end date > 0. The challenge is that dates I want to subtract from each other belong to different rows and are under separate entries (aka. hospital stays). This dataset has over 300k entries, so I would like to find an efficient way to find these individuals and then collapse the dataset so that I have one row for each patient with a number of days they were out of the facility between the very first begin date and the very last end date. Here is the table from the dataset that explains it.

px_id Facility Code Begin date End Date
1 656 03/03/15 06/03/15
1 656 06/19/15 09/20/15
1 656 11/20/15 02/21/16
2 123 01/02/16 01/20/16
2 123 01/27/16 02/18/16

Thank you!
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10175

07 Aug 2022, 10:46

Subtracting entries across observations (not rows!!) is entirely possible. What you need to make sure is that your dates are numerical date variables recognized by Stata. Using dataex to present a data example following the advice in FAQ Advice #12 would have made this clear. The replace command in the second line distributes the value 1 across all observations of px_id if at least one observation satisfies the condition. Note that the following considers time gaps within the same facility. If you want gaps not considering facility, delete facilitycode from the first line of the code.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte px_id int facilitycode float(begin end)
1 656 20150 20242
1 656 20258 20351
1 656 20412 20505
2 123 20455 20473
2 123 20480 20502
end
format %tdN/D/Y begin
format %tdN/D/Y end

bys px_id facilitycode (begin end): gen gap= begin-end[_n-1] > 0 & _n>1
bys px_id (gap): replace gap= gap[_N]

Res.:

Code:

. l, sepby(px_id)

     +----------------------------------------------+
     | px_id   facili~e      begin        end   gap |
     |----------------------------------------------|
  1. |     1        656   03/03/15   06/03/15     1 |
  2. |     1        656   06/19/15   09/20/15     1 |
  3. |     1        656   11/20/15   02/21/16     1 |
     |----------------------------------------------|
  4. |     2        123   01/02/16   01/20/16     1 |
  5. |     2        123   01/27/16   02/18/16     1 |
     +----------------------------------------------+

Last edited by Andrew Musau; 07 Aug 2022, 11:00.

Comment

Emile Salerno

Join Date: Aug 2022

Posts: 8
#3

08 Aug 2022, 06:57

Thank you! This worked out very well.

One additional question I had is as follows: I have noticed that there are patients who are assigned to more than once facility and I want to make sure I determine who these people are. How would I identify those px_id that have been to more than one facility?

Thank you again.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10175

08 Aug 2022, 10:54

Code:

bys px_id (facilitycode): gen count = sum(facilitycode!=facilitycode[_n-1])
by px_id: replace count= count[_N]
tabdisp px_id, cell(count)

Last edited by Andrew Musau; 08 Aug 2022, 10:58.

Announcement