Identifying overlapping dates across multiple rows to define a spell in Stata 17

Ben Lazrus

Join Date: Apr 2022

Posts: 25
#1

Identifying overlapping dates across multiple rows to define a spell in Stata 17

02 Apr 2023, 02:32

Dear Statalist,

I'm doing research on hemodialysis catheters. Patients can have multiple catheters in at the same time. I want to analyse patient follow up using date ranges in which >=1 catheter is in, and date ranges in which they are catheter free

I have panel data, in a format provided in the code below. I've tried to define spells of catheters by identifying overlaps, which successfully defines the first observation in the spell. However, sometimes a catheter that is several rows below the first observation in the spell, should also be included in the spell. I'm not sure how to create a spell condition which relates a given row to all previous rows in the spell.

Specifically, I get the following data structure, but I want the spells to cover the rows I have circled in red

Any help would be most appreciated.

Kind regards,
Ben

Code:

input ptid insertdate removaldate 1 100 199 1 300 900 1 350 400 1 450 500 1 600 800 1 900 1000 1 1100 1200 1 1210 1300 1 1400 1500 1 1600 1700 1 1800 1900 2 0 1100 2 250 500 2 500 1000 2 1010 1200 2 1300 1500 2 1700 1800 2 1900 1950 2 2000 2050 2 2055 2150 end bysort ptid (insertdate removaldate): gen pstime = _n tsset ptid pstime tsspell, cond(removaldate >= insertdate[_n+1] & insertdate <= removaldate[_n+1]) by ptid: replace _spell = _spell[_n-1] if _spell==0 & _spell[_n-1] != 0 & _spell[_n-1]!=. /// & insertdate <= removaldate[_n-1] & removaldate[_n-1]!=. gen byte true_end = 0 by ptid: replace true_end = 1 if _end[_n-1]==1 & insertdate <= removaldate[_n-1] & removaldate[_n-1]!=. by ptid: replace _seq = (L._seq + 1) if true_end == 1 list, sepby(ptid) noobs
Tags: None

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2458

02 Apr 2023, 06:58

Here is some technique. There is likely a more elegant solution. It uses the -newspell- command from Stata Journal (-search dm0078_3-). -newspell- does the heavy lifting to convert several, possibly overlapping, periods into a long dataset with one observation per day per patient. Then those data are converted to spells, reshaped wide again to form new spells of contiguous periods. From there, the ranges are merged back into to the original data and each original observation must fall into one of the newly defined periods.

Code:

* consolidate spells to overlapping windows
tempname spells
frame copy default spells
cwf spells

sort ptid insertdate removaldate
assert insertdate < removaldate

gen stype=1
bys ptid (insertdate removaldate) : gen snum = _n
newspell tolong , id(ptid) begin(insertdate) end(removaldate) snum(snum) stype(stype) time(time)
drop stype

bys ptid (time) : gen byte _start = cond(_n==1 | time > time[_n-1]+1, 1, 0)
bys ptid (time) : gen int spell = sum(_start)
drop _start
bys ptid spell (time) : keep if _n==1 | _n==_N

bys ptid spell (time) : gen byte _seq = _n
reshape wide time , i(ptid spell) j(_seq)
rename (time1 time2) (start end)
qui compress
save `spells', replace

* merge spells with original data
cwf default
joinby ptid using `spells'
gen int newspell = cond(insertdate >= start & removaldate <= end, spell, .)
drop if mi(newspell)
drop spell
sort ptid insertdate removaldate

list , sepby(ptid newspell) abbrev(16)

Result

Code:

. list , sepby(ptid newspell) abbrev(16)

     +-----------------------------------------------------------+
     | ptid   insertdate   removaldate   start    end   newspell |
     |-----------------------------------------------------------|
  1. |    1          100           199     100    199          1 |
     |-----------------------------------------------------------|
  2. |    1          300           900     300   1000          2 |
  3. |    1          350           400     300   1000          2 |
  4. |    1          450           500     300   1000          2 |
  5. |    1          600           800     300   1000          2 |
  6. |    1          900          1000     300   1000          2 |
     |-----------------------------------------------------------|
  7. |    1         1100          1200    1100   1200          3 |
     |-----------------------------------------------------------|
  8. |    1         1210          1300    1210   1300          4 |
     |-----------------------------------------------------------|
  9. |    1         1400          1500    1400   1500          5 |
     |-----------------------------------------------------------|
 10. |    1         1600          1700    1600   1700          6 |
     |-----------------------------------------------------------|
 11. |    1         1800          1900    1800   1900          7 |
     |-----------------------------------------------------------|
 12. |    2            0          1100       0   1200          1 |
 13. |    2          250           500       0   1200          1 |
 14. |    2          500          1000       0   1200          1 |
 15. |    2         1010          1200       0   1200          1 |
     |-----------------------------------------------------------|
 16. |    2         1300          1500    1300   1500          2 |
     |-----------------------------------------------------------|
 17. |    2         1700          1800    1700   1800          3 |
     |-----------------------------------------------------------|
 18. |    2         1900          1950    1900   1950          4 |
     |-----------------------------------------------------------|
 19. |    2         2000          2050    2000   2050          5 |
     |-----------------------------------------------------------|
 20. |    2         2055          2150    2055   2150          6 |
     +-----------------------------------------------------------+

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#3

02 Apr 2023, 10:17

It is not entirely clear to me what you mean by a spell here. My best guess is you mean a period of time during which a ptid has at least one catheter in place, although it may be several catheters inserted at different times and removed at different times. So the following code will do that. It is based entirely on native Stata commands--no user-written programs required.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte ptid int(insertdate removaldate) 1 100 199 1 300 900 1 350 400 1 450 500 1 600 800 1 900 1000 1 1100 1200 1 1210 1300 1 1400 1500 1 1600 1700 1 1800 1900 2 0 1100 2 250 500 2 500 1000 2 1010 1200 2 1300 1500 2 1700 1800 2 1900 1950 2 2000 2050 2 2055 2150 end gen `c(obs_t)' obs_no = _n reshape long @date, i(obs_no) j(event) string by ptid (date event), sort: gen n_catheters = sum((event == "insert") - (event == "removal")) by ptid (date event): gen spell_num = sum(n_catheters != 0 & (n_catheters[_n-1] == 0)| _n == 1) collapse (min) begin = date (max) end = date, by(ptid spell_num)

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#4

02 Apr 2023, 11:15

See also https://journals.sagepub.com/doi/pdf...867X1301300116 for related technique.
1 like
Comment
Ben Lazrus

Join Date: Apr 2022

Posts: 25
#5

03 Apr 2023, 09:19

very helpful, thanks all!
Comment

Announcement