Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identifying overlapping dates across multiple rows to define a spell in Stata 17

    Dear Statalist,

    I'm doing research on hemodialysis catheters. Patients can have multiple catheters in at the same time. I want to analyse patient follow up using date ranges in which >=1 catheter is in, and date ranges in which they are catheter free

    I have panel data, in a format provided in the code below. I've tried to define spells of catheters by identifying overlaps, which successfully defines the first observation in the spell. However, sometimes a catheter that is several rows below the first observation in the spell, should also be included in the spell. I'm not sure how to create a spell condition which relates a given row to all previous rows in the spell.

    Specifically, I get the following data structure, but I want the spells to cover the rows I have circled in red


    Click image for larger version

Name:	statalist.png
Views:	1
Size:	85.9 KB
ID:	1708128



    Any help would be most appreciated.

    Kind regards,
    Ben


    Code:
    input ptid    insertdate    removaldate
    1    100    199
    1    300    900
    1    350    400
    1    450    500
    1    600    800
    1    900    1000
    1    1100    1200
    1    1210    1300
    1    1400    1500
    1    1600    1700
    1    1800    1900
    2    0    1100
    2    250    500
    2    500    1000
    2    1010    1200
    2    1300    1500
    2    1700    1800
    2    1900    1950
    2    2000    2050
    2    2055    2150
    end
    
    bysort ptid (insertdate removaldate): gen pstime = _n
    
    tsset ptid pstime
    
    tsspell, cond(removaldate >= insertdate[_n+1] & insertdate <= removaldate[_n+1])
    
    by ptid: replace _spell = _spell[_n-1] if _spell==0 & _spell[_n-1] != 0 & _spell[_n-1]!=. ///
                                            & insertdate <= removaldate[_n-1] & removaldate[_n-1]!=.
    
    gen byte true_end = 0
    by ptid: replace true_end = 1 if _end[_n-1]==1 & insertdate <= removaldate[_n-1] & removaldate[_n-1]!=.
    
    by ptid: replace _seq = (L._seq + 1) if true_end == 1
    
    list, sepby(ptid) noobs



  • #2
    Here is some technique. There is likely a more elegant solution. It uses the -newspell- command from Stata Journal (-search dm0078_3-). -newspell- does the heavy lifting to convert several, possibly overlapping, periods into a long dataset with one observation per day per patient. Then those data are converted to spells, reshaped wide again to form new spells of contiguous periods. From there, the ranges are merged back into to the original data and each original observation must fall into one of the newly defined periods.

    Code:
    * consolidate spells to overlapping windows
    tempname spells
    frame copy default spells
    cwf spells
    
    sort ptid insertdate removaldate
    assert insertdate < removaldate
    
    gen stype=1
    bys ptid (insertdate removaldate) : gen snum = _n
    newspell tolong , id(ptid) begin(insertdate) end(removaldate) snum(snum) stype(stype) time(time)
    drop stype
    
    bys ptid (time) : gen byte _start = cond(_n==1 | time > time[_n-1]+1, 1, 0)
    bys ptid (time) : gen int spell = sum(_start)
    drop _start
    bys ptid spell (time) : keep if _n==1 | _n==_N
    
    bys ptid spell (time) : gen byte _seq = _n
    reshape wide time , i(ptid spell) j(_seq)
    rename (time1 time2) (start end)
    qui compress
    save `spells', replace
    
    * merge spells with original data
    cwf default
    joinby ptid using `spells'
    gen int newspell = cond(insertdate >= start & removaldate <= end, spell, .)
    drop if mi(newspell)
    drop spell
    sort ptid insertdate removaldate
    
    list , sepby(ptid newspell) abbrev(16)
    Result

    Code:
    . list , sepby(ptid newspell) abbrev(16)
    
         +-----------------------------------------------------------+
         | ptid   insertdate   removaldate   start    end   newspell |
         |-----------------------------------------------------------|
      1. |    1          100           199     100    199          1 |
         |-----------------------------------------------------------|
      2. |    1          300           900     300   1000          2 |
      3. |    1          350           400     300   1000          2 |
      4. |    1          450           500     300   1000          2 |
      5. |    1          600           800     300   1000          2 |
      6. |    1          900          1000     300   1000          2 |
         |-----------------------------------------------------------|
      7. |    1         1100          1200    1100   1200          3 |
         |-----------------------------------------------------------|
      8. |    1         1210          1300    1210   1300          4 |
         |-----------------------------------------------------------|
      9. |    1         1400          1500    1400   1500          5 |
         |-----------------------------------------------------------|
     10. |    1         1600          1700    1600   1700          6 |
         |-----------------------------------------------------------|
     11. |    1         1800          1900    1800   1900          7 |
         |-----------------------------------------------------------|
     12. |    2            0          1100       0   1200          1 |
     13. |    2          250           500       0   1200          1 |
     14. |    2          500          1000       0   1200          1 |
     15. |    2         1010          1200       0   1200          1 |
         |-----------------------------------------------------------|
     16. |    2         1300          1500    1300   1500          2 |
         |-----------------------------------------------------------|
     17. |    2         1700          1800    1700   1800          3 |
         |-----------------------------------------------------------|
     18. |    2         1900          1950    1900   1950          4 |
         |-----------------------------------------------------------|
     19. |    2         2000          2050    2000   2050          5 |
         |-----------------------------------------------------------|
     20. |    2         2055          2150    2055   2150          6 |
         +-----------------------------------------------------------+

    Comment


    • #3
      It is not entirely clear to me what you mean by a spell here. My best guess is you mean a period of time during which a ptid has at least one catheter in place, although it may be several catheters inserted at different times and removed at different times. So the following code will do that. It is based entirely on native Stata commands--no user-written programs required.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte ptid int(insertdate removaldate)
      1  100  199
      1  300  900
      1  350  400
      1  450  500
      1  600  800
      1  900 1000
      1 1100 1200
      1 1210 1300
      1 1400 1500
      1 1600 1700
      1 1800 1900
      2    0 1100
      2  250  500
      2  500 1000
      2 1010 1200
      2 1300 1500
      2 1700 1800
      2 1900 1950
      2 2000 2050
      2 2055 2150
      end
      
      gen `c(obs_t)' obs_no = _n
      reshape long @date, i(obs_no) j(event) string
      by ptid (date event), sort: gen n_catheters = sum((event == "insert") - (event == "removal"))
      
      by ptid (date event): gen spell_num = sum(n_catheters != 0 & (n_catheters[_n-1] == 0)| _n == 1)
      collapse (min) begin = date (max) end = date, by(ptid spell_num)
      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Comment


      • #4
        See also https://journals.sagepub.com/doi/pdf...867X1301300116 for related technique.

        Comment


        • #5
          very helpful, thanks all!

          Comment

          Working...
          X