Trying to calculate time length for hospital episodes of multiple consec. rows, but also multiple nested within one 'person'

Lisa Nicole

Join Date: Jan 2021

Posts: 13
#1

Trying to calculate time length for hospital episodes of multiple consec. rows, but also multiple nested within one 'person'

31 Jul 2021, 16:27

Hi STATALIST,
First time poster here. I am hoping someone can assist with my hospital record data calculations of nested episodes length of stay, where my dataset has many separations (rows) per person (ppn), and I need to calculate the length of stay for contiguous separations that create one whole episode of care... however there can possibly many many of these 'whole episodes of care' for one person over the 10- years of data. Here is an example - where the 'ppn' shows same person, being 'admitted' (admitdttm) to hospital on a given date/time, separated (dischdttm) on a given date/time - I have calculated ''cont_stay' where one separation doesn't exceed 1 day between the next admission (using [_n-1 etrc] functions.. the 'obs' variable is a continous count of all separations by ppn over the 10 years of data. The episode_seq_no shows the chronological order of separations within ONE unique whole episode of care. What I want to do but can't figure out how (!!!) - - is sum the 'los_sep_days' (days rounded to 2 dec places) of each separation, but within one whole episode of care (i.e. episode_seq_n starts at 1 and finished when cont_stay no longer == 1. I hope this explains it in words.. or here is a snip of the data:

THANK YOU in advance if you can help!! I have spent hours and hours on this and am going round in circles.

Attached Files
Tags: datatransformation, Suggestion, syntax
Andrew Musau

Join Date: Oct 2014

Posts: 10168
#2

31 Jul 2021, 16:42

It will be easier for someone to help you if you present a data example using the dataex command, as recommended in FAQ Advice #12. You can do this by copying and pasting the result of the following:

Code:

dataex ppn-last in 1258631/1258662
1 like
Comment
Lisa Nicole

Join Date: Jan 2021

Posts: 13
#3

01 Aug 2021, 17:00

Thanks Andrew but I cannot do this as the data are on a secure server and I can't take anything out ... this is the data example as best I can do - you can see variable names and individual identifiers, as well as values. Please let me know if you need further explanation of what this is... it seems obvious to me but then it is my data lol. Thanks so much.
Comment

Leonardo Guizzetti

Join Date: Jul 2016
Posts: 2400

01 Aug 2021, 21:09

Many people come to this forum, asking for help with code, yet not providing data for X reason. That's perfectly reasonable, but then you are asked in the FAQ (and by Andrew in #2) to provide reasonable alternative data, including toy, simulated or otherwise fake data, that replicates meaningful aspects of your data, such as similar variable names, types and values and overall data structure.

Since no dataex was provided, I took the liberty of making my own minimal dataset that I think is at the core of your request -- creating the conditional cumulative sum. The approach and subject and observation id (which you have) and a variable (-spellid- here) that numbers spells be numbered, and non-spells take a value of zero. You don't explicitly show an analog of -spellid-, but you do have -episode_seq_no- which makes me think your dataset does. The sum takes advantage of the fact that many egen functions accept Stata expressions, so I used -cond()- to implement this logic: return the value of interest when in a spell, or zero otherwise. Begin at "begin here", and I show just the first subject.

Note that in your data, you have duplicate observations on key interaction details that I expect would not be there in cleaned, real data. See 1258639 and 125640, for example.

Code:

clear *
cls

version 17
set seed `=strreverse("1621483")'

// create subjects
set obs 10
gen subjid = _n

// create some observations per subject, and nubmer them.
gen nobs = runiformint(5, 20)
expand nobs
drop nobs
bys subjid : gen int obsid = _n

// assign value to visit, e.g., duration
gen byte val = runiformint(1, 10)

// generate spells by rnadom sequence generation
gen byte seq = rbinomial(1, 0.5)

// count spells and and spell length. Wwant to impose spells have min length of 2.
tsset subjid obsid
tsspell , cond(seq==1) spell(spellid)
tsset, clear
bys subjid spellid : egen byte _length = max(_seq)
sort subjid obsid
replace spellid = 0 if _length==1
drop _seq _end _length

tsset subjid obsid
tsspell , cond(spellid) spell(spellid2)
tsset, clear
drop spellid _end
rename spellid2 spellid

// begin here
bys subjid spellid (_seq) : egen spell_total = total(cond(spellid, val, 0))
sort subjid obsid
list if subjid==1, sepby(subjid spellid) abbrev(12) noobs

Res:

Code:

  +-----------------------------------------------------------+
  | subjid   obsid   val   seq   _seq   spellid   spell_total |
  |-----------------------------------------------------------|
  |      1       1     7     0      0         0             0 |
  |-----------------------------------------------------------|
  |      1       2    10     1      1         1            17 |
  |      1       3     7     1      2         1            17 |
  |-----------------------------------------------------------|
  |      1       4     9     0      0         0             0 |
  |-----------------------------------------------------------|
  |      1       5     8     1      1         2            20 |
  |      1       6     1     1      2         2            20 |
  |      1       7     1     1      3         2            20 |
  |      1       8    10     1      4         2            20 |
  |-----------------------------------------------------------|
  |      1       9     7     0      0         0             0 |
  |      1      10     8     1      0         0             0 |
  |      1      11     4     0      0         0             0 |
  |      1      12     5     1      0         0             0 |
  |      1      13     2     0      0         0             0 |
  |      1      14     6     0      0         0             0 |
  |      1      15     7     0      0         0             0 |
  |      1      16     8     1      0         0             0 |
  |      1      17     2     0      0         0             0 |
  |      1      18     8     0      0         0             0 |
  +-----------------------------------------------------------+

Altneratively, if you don't have a variable identifying episode/spell number within subjects, then you can generate it from your -episide_seq_no- (-new_seq- here).

Code:

version 17
set seed `=strreverse("1621483")'

// create subjects
set obs 10
gen subjid = _n

// create some observations per subject, and nubmer them.
gen nobs = runiformint(5, 20)
expand nobs
drop nobs
bys subjid : gen int obsid = _n

// assign value to visit, e.g., duration
gen byte val = runiformint(1, 10)

// generate spells by rnadom sequence generation
gen byte seq = rbinomial(1, 0.5)

// count spells and and spell length. Wwant to impose spells have min length of 2.
tsset subjid obsid
tsspell , cond(seq==1) spell(spellid)
tsset, clear
bys subjid spellid : egen byte _length = max(_seq)
sort subjid obsid
replace spellid = 0 if _length==1
drop _seq _end _length

tsset subjid obsid
tsspell , cond(spellid) spell(spellid2)
tsset, clear
drop spellid _end
rename spellid2 spellid

gen new_seq = cond(_seq==0, 1, _seq)
drop spellid seq _seq

// start here
bys subjid (obsid) : gen byte want = cond(new_seq==1 & (new_seq[_n+1]==1 | mi(new_seq[_n+1])), 0, 1)

tsset subjid obsid
tsspell want , cond(want==1)
drop _seq _end
bys subjid _spell (obsid) : egen spell_total = total(cond(_spell, val, 0))
sort subjid obsid
tsset, clear

list if subjid==1, sepby(subjid _spell) abbrev(12) noobs

Res.:

Code:

  +--------------------------------------------------------------+
  | subjid   obsid   val   new_seq   want   _spell   spell_total |
  |--------------------------------------------------------------|
  |      1       1     7         1      0        0             0 |
  |--------------------------------------------------------------|
  |      1       2    10         1      1        1            17 |
  |      1       3     7         2      1        1            17 |
  |--------------------------------------------------------------|
  |      1       4     9         1      0        0             0 |
  |--------------------------------------------------------------|
  |      1       5     8         1      1        2            20 |
  |      1       6     1         2      1        2            20 |
  |      1       7     1         3      1        2            20 |
  |      1       8    10         4      1        2            20 |
  |--------------------------------------------------------------|
  |      1       9     7         1      0        0             0 |
  |      1      10     8         1      0        0             0 |
  |      1      11     4         1      0        0             0 |
  |      1      12     5         1      0        0             0 |
  |      1      13     2         1      0        0             0 |
  |      1      14     6         1      0        0             0 |
  |      1      15     7         1      0        0             0 |
  |      1      16     8         1      0        0             0 |
  |      1      17     2         1      0        0             0 |
  |      1      18     8         1      0        0             0 |
  +--------------------------------------------------------------+

Last edited by Leonardo Guizzetti; 01 Aug 2021, 21:11.

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10168

01 Aug 2021, 22:09

I agree with Leonardo's advice, you could prepare the dataset yourself based on the image. Otherwise, you are asking too much for others to do this on your behalf. The good news is that I found a site that was able to convert your attached png file to text: https://www.onlineocr.net/

The observations highlighted in red below look like perfect duplicates. You should check if having both is intended or a mistake on the part of the person inputting the data. If the latter, see

Code:

help duplicates

on how to identify and handle such observations. Otherwise, it appears that you have done most of the hard work generating the sequence variables, so the following is another way to get what you want.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str14 ppn double(admitdttm dischdttm los_sep_days) byte(cont_stay sepmode obs episode_seq_no)
"PPN00000196110" 1789538160000 1.7900781e+12  6.25 1 1  6 2
"PPN00000196110" 1.7909664e+12 1.7914644e+12  5.76 0 1  7 1
"PPN00000196111" 1828448640000 1828559760000  1.29 0 1  1 1
"PPN00000196111" 1904174520000 1.9042938e+12  1.38 0 5  2 1
"PPN00000196111" 1904295720000 1905088380000  9.17 1 1  3 2
"PPN00000196111" 1907944440000 1908636960000  8.02 0 1  4 1
"PPN00000196113" 1664349060000 1.6644501e+12  1.17 0 1  1 1
"PPN00000196113" 1669554360000 1.6697202e+12  1.92 0 5  2 1
"PPN00000196113"  1.669725e+12 1670770320000  12.1 1 1  3 2
"PPN00000196113"  1.669725e+12 1670770320000  12.1 1 1  4 3
"PPN00000196113" 1671393180000 1.6714512e+12   .67 0 1  5 1
"PPN00000196113" 1672901940000 1674129420000 14.21 0 1  6 1
"PPN00000196113" 1705458060000 1.7055933e+12  1.57 0 1  9 1
"PPN00000196113" 1.7116182e+12 1711714320000  1.11 0 1 10 1
"PPN00000196113" 1712612460000 1.7131026e+12  5.67 0 1 11 1
"PPN00000196113" 1744181220000 1.7443638e+12  2.11 0 1 12 1
"PPN00000196113" 1.7459298e+12 1746097140000  1.94 0 5 13 1
"PPN00000196113" 1.7460972e+12 1.7463708e+12  3.17 1 1 14 2
"PPN00000196113" 1773816720000 1.7740014e+12  2.14 0 1 15 1
"PPN00000196113" 1794308880000 1794931440000  7.21 0 1 16 1
"PPN00000196113"  1.794933e+12 1.7958132e+12 10.19 1 1 17 2
"PPN00000196113" 1795813260000 1.7967348e+12 10.67 1 1 18 3
"PPN00000196113" 1818459120000 1.8185184e+12   .69 0 1 19 1
"PPN00000196113" 1831722480000 1831744680000   .26 0 1 20 1
"PPN00000196113" 1842619740000 1842638760000   .22 0 1 21 1
"PPN00000196113" 1846322940000 1846340460000    .2 0 1 22 1
"PPN00000196113" 1847530380000 1.8477057e+12  2.03 0 1 23 1
"PPN00000196113" 1852364460000 1852384680000   .23 0 1 24 1
"PPN00000196113" 1868799720000 1.8688392e+12   .46 0 1 25 1
"PPN00000196113" 1881198240000 1.8813318e+12  1.55 0 1 26 1
"PPN00000196115" 1.6556697e+12 1.6557084e+12   .45 0 1  1 1
"PPN00000196115" 1722344641000 1.7223594e+12   .17 0 1  2 1
end
format %tc admitdttm
format %tc dischdttm

bys ppn (obs): gen cgroup= sum((episode_seq_no<episode_seq_no[_n+1]) & episode_seq_no==1 & !missing(episode_seq_no[_n+1]))
bys ppn (obs): replace cgroup= . if cgroup==0| (episode_seq_no==1 & episode_seq_no[_n-1]>=1 &episode_seq_no[_n+1]==1 &_n>1)| (episode_seq_no==1 & _n==_N)
bys ppn cgroup (obs): egen wanted= total(los_sep_days) if !missing(cgroup)

There may be conditions that are not immediately apparent to me that may affect how the code works in your full dataset, so do check the results after you run the code. Otherwise, if you start the process from scratch, consider Leonardo's approach in #4.

Res.:

Code:

. sort ppn obs

. l, sepby(ppn cgroup)

     +-----------------------------------------------------------------------------------------------------------------------------+
     |            ppn            admitdttm            dischdttm   los_se~s   cont_s~y   sepmode   obs   episod~o   cgroup   wanted |
     |-----------------------------------------------------------------------------------------------------------------------------|
  1. | PPN00000196110   15sep2016 05:56:00   21sep2016 11:55:00       6.25          1         1     6          2        .        . |
  2. | PPN00000196110   01oct2016 18:40:00   07oct2016 13:00:00       5.76          0         1     7          1        .        . |
     |-----------------------------------------------------------------------------------------------------------------------------|
  3. | PPN00000196111   09dec2017 14:24:00   10dec2017 21:16:00       1.29          0         1     1          1        .        . |
     |-----------------------------------------------------------------------------------------------------------------------------|
  4. | PPN00000196111   04may2020 01:22:00   05may2020 10:30:00       1.38          0         5     2          1        1    10.55 |
  5. | PPN00000196111   05may2020 11:02:00   14may2020 15:13:00       9.17          1         1     3          2        1    10.55 |
     |-----------------------------------------------------------------------------------------------------------------------------|
  6. | PPN00000196111   16jun2020 16:34:00   24jun2020 16:56:00       8.02          0         1     4          1        .        . |
     |-----------------------------------------------------------------------------------------------------------------------------|
  7. | PPN00000196113   27sep2012 07:11:00   28sep2012 11:15:00       1.17          0         1     1          1        .        . |
     |-----------------------------------------------------------------------------------------------------------------------------|
  8. | PPN00000196113   26nov2012 13:06:00   28nov2012 11:10:00       1.92          0         5     2          1        1    26.12 |
  9. | PPN00000196113   28nov2012 12:30:00   10dec2012 14:52:00       12.1          1         1     3          2        1    26.12 |
 10. | PPN00000196113   28nov2012 12:30:00   10dec2012 14:52:00       12.1          1         1     4          3        1    26.12 |
     |-----------------------------------------------------------------------------------------------------------------------------|
 11. | PPN00000196113   17dec2012 19:53:00   18dec2012 12:00:00        .67          0         1     5          1        .        . |
 12. | PPN00000196113   04jan2013 06:59:00   18jan2013 11:57:00      14.21          0         1     6          1        .        . |
 13. | PPN00000196113   16jan2014 02:21:00   17jan2014 15:55:00       1.57          0         1     9          1        .        . |
 14. | PPN00000196113   28mar2014 09:30:00   29mar2014 12:12:00       1.11          0         1    10          1        .        . |
 15. | PPN00000196113   08apr2014 21:41:00   14apr2014 13:50:00       5.67          0         1    11          1        .        . |
 16. | PPN00000196113   09apr2015 06:47:00   11apr2015 09:30:00       2.11          0         1    12          1        .        . |
     |-----------------------------------------------------------------------------------------------------------------------------|
 17. | PPN00000196113   29apr2015 12:30:00   01may2015 10:59:00       1.94          0         5    13          1        2     5.11 |
 18. | PPN00000196113   01may2015 11:00:00   04may2015 15:00:00       3.17          1         1    14          2        2     5.11 |
     |-----------------------------------------------------------------------------------------------------------------------------|
 19. | PPN00000196113   17mar2016 06:52:00   19mar2016 10:10:00       2.14          0         1    15          1        .        . |
     |-----------------------------------------------------------------------------------------------------------------------------|
 20. | PPN00000196113   09nov2016 11:08:00   16nov2016 16:04:00       7.21          0         1    16          1        3    28.07 |
 21. | PPN00000196113   16nov2016 16:30:00   26nov2016 21:00:00      10.19          1         1    17          2        3    28.07 |
 22. | PPN00000196113   26nov2016 21:01:00   07dec2016 13:00:00      10.67          1         1    18          3        3    28.07 |
     |-----------------------------------------------------------------------------------------------------------------------------|
 23. | PPN00000196113   15aug2017 23:32:00   16aug2017 16:00:00        .69          0         1    19          1        .        . |
 24. | PPN00000196113   16jan2018 11:48:00   16jan2018 17:58:00        .26          0         1    20          1        .        . |
 25. | PPN00000196113   22may2018 14:49:00   22may2018 20:06:00        .22          0         1    21          1        .        . |
 26. | PPN00000196113   04jul2018 11:29:00   04jul2018 16:21:00         .2          0         1    22          1        .        . |
 27. | PPN00000196113   18jul2018 10:53:00   20jul2018 11:35:00       2.03          0         1    23          1        .        . |
 28. | PPN00000196113   12sep2018 09:41:00   12sep2018 15:18:00        .23          0         1    24          1        .        . |
 29. | PPN00000196113   21mar2019 15:02:00   22mar2019 02:00:00        .46          0         1    25          1        .        . |
 30. | PPN00000196113   12aug2019 03:04:00   13aug2019 16:10:00       1.55          0         1    26          1        .        . |
     |-----------------------------------------------------------------------------------------------------------------------------|
 31. | PPN00000196115   18jun2012 20:15:00   19jun2012 07:00:00        .45          0         1     1          1        .        . |
 32. | PPN00000196115   30jul2014 13:04:01   30jul2014 17:10:00        .17          0         1     2          1        .        . |
     +-----------------------------------------------------------------------------------------------------------------------------+

.

Comment

Lisa Nicole

Join Date: Jan 2021

Posts: 13
#6

05 Aug 2021, 18:57

Thank you so very much Leonardo Guizzetti and Andrew Musau ! My apologies I have not been previously aware of dataex command or how to do this, so appreciate your advice. Have been delayed in responding due to COVID lockdown, home-schooling etc... But will have a good look at this over the weekend and let you know how I go. Thank you for assisting me despite my naiivety in coding! Lisa
Comment
Lisa Nicole

Join Date: Jan 2021

Posts: 13
#7

06 Aug 2021, 07:19

Andrew Musau omg you are a genius!! Thank you so very much!! This has worked completely well... amazing! If at all I am able to ask one more favour? (thank you sincerely for your time already) - I confess I am not totally understanding your 3 magic lines of code... ... I understand well the function [_n+/- 1] etc... to 'look' at rows above and below .. and obviously how the sorting is achieved, as well as the 3rd line.
What I am getting confused with is what the 'sum' is doing and how? And then the conditional 'string' of commands that make the cgroup variable non continuous by the episode_seq vble... ... I ran the first line of code while watching at the variables in question and what this 'made'... However I didn't totally understand how this worked to create what it did. If you are too busy I understand and simply very much appreciate your superb help thanks so much. !!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10168
#8

07 Aug 2021, 01:53

Nick Cox gives an excellent tutorial on this technique in his SJ column, see https://journals.sagepub.com/doi/pdf...867X0700700209

What I am getting confused with is what the 'sum' is doing and how?

The -sum()- function of generate is just creating a running sum, the only thing is that we are specifying some conditions in the process. Consider a more straightforward example:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str1 letter float time "A" 1 "B" 2 "C" 3 "A" 4 "B" 5 "A" 6 "C" 7 "C" 8 "A" 9 end gen sumA= sum(letter=="A")

Here, we want to sum As, and the condition is letter=="A", which gives us

Res.:

Code:

. l, sep(0) +----------------------+ | letter time sumA | |----------------------| 1. | A 1 1 | 2. | B 2 1 | 3. | C 3 1 | 4. | A 4 2 | 5. | B 5 2 | 6. | A 6 3 | 7. | C 7 3 | 8. | C 8 3 | 9. | A 9 4 | +----------------------+

So the logic is - add one if letter is A, otherwise add 0. In your case, we identify a continuous episode by the conditions:

1. First "episode_seq_no==1"
2. Second "episode_seq_no==2"
3. ...
and so on.

This gives you the first line of the code. However, notice that in my example, at time 2, the running sum at time 2 and time 3 is 1 since 1+0 = 1 and 1+0+0=1, corresponding to the leters B and C, respectively. My second line of code turns these values to missing, as they do not correspond to the continuous episodes. Once you combine both lines, you have an identifier for the continuous episodes.

Last edited by Andrew Musau; 07 Aug 2021, 01:56.
Comment
Lisa Nicole

Join Date: Jan 2021

Posts: 13
#9

09 Aug 2021, 05:29

Andrew Musau THANK YOU sincerely! You are a gentleman and a scholar... this is a perfect explanation thanks so very much... and I have also read the article you linked to. SO much help thank you kindly.

Lisa
Comment

Announcement