Identifying events within a certain time period

Torsten Joerger

Join Date: Nov 2020

Posts: 11
#1

Identifying events within a certain time period

13 May 2021, 11:43

Hello. I have a simplified data set (see below) with variables subject_id, encounter_date and encounter_num. I am interested in counting repeat encounters that happen >1 but <15 days after the first encounter. In addition, I would like to carry out the same process for subsequent encounters, as long as they were >30 days from the prior index encounter. For the subject 33204 listed below, I would like to be able to use encounter number 1 as the initial event, then count encounters 4 and 5 (because they occurred >1 and <15 days from encounter 1.) Then I would like to next identify and use encounter number 8 (because it occurred 30 days from the prior index encounter) and count encounter 9 (occurred >1 and <15 days from encounter 8). Essentially I'm trying to count the number of events that occur in a specific time period and repeat this process every 30 days.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double subject_id float(encounter_date encounter_num) 33204 18265 1 33204 18265 2 33204 18265 3 33204 18269 4 33204 18275 5 33204 18288 6 33204 18288 7 33204 18302 8 33204 18311 9 33204 18324 10 33204 18337 11 33204 18393 12 33204 18394 13 33204 18419 14 33204 18420 15 33204 18428 16 33204 18456 17 33204 18526 18 33204 18530 19 33204 18532 20 33204 18541 21 33204 18570 22 33204 18599 23 33204 18632 24 33204 18655 25 33204 18686 26 33205 18265 1 33205 18266 2 33205 18272 3 33205 18293 4 33205 18331 5 33205 18348 6 33205 18393 7 33205 18444 8 33205 18543 9 33205 18637 10 33205 18687 11 33205 18693 12 33205 18694 13 33205 18725 14 33205 18892 15 33205 19005 16 end format %td encounter_date
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#2

13 May 2021, 12:44

Your question is unclear. In what way do you want to "use" or "identify" these later encounters. Do you want to create a new variable that tells you how many of them there are? Do you want to create a new variable (or variables) that identify their encounter numbers? Do you want to drop all the other encounters from the data set? Something else? Putting it concretely, for the example data you show, what would the results you want look like in Stata?
Comment
Torsten Joerger

Join Date: Nov 2020

Posts: 11
#3

13 May 2021, 13:37

Good questions Clyde. I'd like to create a new variable that is did encounter have another encounter (yes or no) in the preceding 30 days and if no create a count of the subsequent encounters that occurred >1 but <15 days. So the data below would look like :

33204 18265 1 0 2 33204 18265 2 1 33204 18265 3 1 33204 18269 4 1 33204 18275 5 1 33204 18288 6 1 33204 18288 7 1 33204 18302 8 1 33204 18311 9 1 33204 18324 10 1 33204 18337 11 1 33204 18393 12 0 0 33204 18394 13 1 33204 18419 14 1 33204 18420 15 1 33204 18428 16 1 33204 18456 17 1 33204 18526 18 0 2 33204 18530 19 1 33204 18532 20 1 33204 18541 21 1
Comment

Torsten Joerger

Join Date: Nov 2020
Posts: 11

13 May 2021, 15:35

Sorry, the formatting got messed up the last post. This is what I want to generate to create a new variable 4 if they had an encounter in the preceding 30 days yes or no and if no then variable 5, how many encounters occurred >1 but <15 days post that encounter.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double subject_id float(encounter_date encounter_num var4 var5)
33204 18265  1 0 2
33204 18265  2 1 .
33204 18265  3 1 .
33204 18269  4 1 .
33204 18275  5 1 .
33204 18288  6 1 .
33204 18288  7 1 .
33204 18302  8 1 .
33204 18311  9 1 .
33204 18324 10 1 .
33204 18337 11 1 .
33204 18393 12 0 0
33204 18394 13 1 .
33204 18419 14 1 .
33204 18420 15 1 .
33204 18428 16 1 .
33204 18456 17 1 .
33204 18526 18 0 2
33204 18530 19 1 .
33204 18532 20 1 .
33204 18541 21 1 .
33205 18265  1 . .
33205 18266  2 . .
33205 18272  3 . .
33205 18293  4 . .
33205 18331  5 . .
33205 18348  6 . .
33205 18393  7 . .
33205 18444  8 . .
33205 18543  9 . .
33205 18637 10 . .
33205 18687 11 . .
33205 18693 12 . .
33205 18694 13 . .
33205 18725 14 . .
33205 18892 15 . .
33205 19005 16 . .
end
format %td encounter_date

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30095

13 May 2021, 17:36

OK, this is pretty complicated, but I think I have it.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double subject_id float(encounter_date encounter_num)
33204 18265  1
33204 18265  2
33204 18265  3
33204 18269  4
33204 18275  5
33204 18288  6
33204 18288  7
33204 18302  8
33204 18311  9
33204 18324 10
33204 18337 11
33204 18393 12
33204 18394 13
33204 18419 14
33204 18420 15
33204 18428 16
33204 18456 17
33204 18526 18
33204 18530 19
33204 18532 20
33204 18541 21
33205 18265  1
33205 18266  2
33205 18272  3
33205 18293  4
33205 18331  5
33205 18348  6
33205 18393  7
33205 18444  8
33205 18543  9
33205 18637 10
33205 18687 11
33205 18693 12
33205 18694 13
33205 18725 14
33205 18892 15
33205 19005 16
end
format %td encounter_date

by subject_id (encounter_num), sort: assert encounter_date >= encounter_date[_n-1] if _n > 1
by subject_id (encounter_num): assert encounter_num[1] == 1
by subject_id (encounter_num): assert encounter_num == encounter_num[_n-1]+1 if _n > 1


capture program drop mark_blocks
program define mark_blocks
    rangestat (max) next_block = encounter_num, interval(encounter_date 0 30)
    local start = 1
    local block_num = 1
    gen block_num = .
    while `start' <= _N {
        local end = next_block[`start']
        if missing(`end') {
            local end = _N
        }
        replace block_num = `block_num' in `start'/`end'
        local ++block_num
        local start = `end' + 1
    }
    exit
end

runby mark_blocks, by(subject_id) verbose

by subject_id block_num (encounter_num), sort: egen var5 ///
    = total(inrange(encounter_date-encounter_date[1], 1, 15))
by subject_id block_num (encounter_num): gen byte var4 = (_n > 1), before(var5)
by subject_id block_num (encounter_num): replace var5 = . if var4 | _N == 1

-rangestat- is written by Robert Picard, Nick Cox, and Roberto Ferrer and is available from SSC.
-runby- is written by Robert Picard and me, and is also available from SSC.

Note: The three assert commands immediately following the loading of data are just there to verify that encounter_num and encounter_date sort the same way, and that encounter_num is, within patients, a consecutive numbering of observations starting from 1. If any of these -assert- commands fails, execution will break--the data are not suitable for use with this code.

I would be delighted if somebody comes up with a simpler solution.

Added: You probably want to drop the -verbose- option from that -runby- command. I put it in there so I could see what was going on while I developed the code, but it's going to generate a lot of useless output cluttering up your log, now that it's working properly.

Last edited by Clyde Schechter; 13 May 2021, 18:32.

Comment

Torsten Joerger

Join Date: Nov 2020

Posts: 11
#6

14 May 2021, 05:56

Hi Clyde, thanks for your help. When I run the the line:
runby mark_blocks, by (subject_id)

All of my variables disappear. Any idea why this would happen?

I also get this output:

number of by-groups = 191906
by-groups with errors = 191906
by-groups with no data = 0
observations processed = 698453
observations saved = 0
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#7

14 May 2021, 10:26

Well, it says that all the by-groups have errors, which his why everything is disappearing. So the question is where the errors are coming from. The code works with the example data you provided, so I can only infer that your actual data is different in some material way. Please post an example with actual data that reproduces this problem and I will try to fix it.
Comment
Torsten Joerger

Join Date: Nov 2020

Posts: 11
#8

14 May 2021, 12:35

I have 40 variables in my data set, and this seems to be too big for dataex, is there another command I should use?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#9

14 May 2021, 12:39

Here's another approach. Use the code the way I originally wrote it, with the -verbose- command. And, for brevity, just run it on a subset of your data, say the first 10 subject_id's. You will see error messages spit out while -runby- executes program mark_blocks. Post those error messages here and we may be able to figure it out just from that.
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#10

14 May 2021, 15:46

I still find your describtion is not clear enough. If I may understand it correctly, below code should help.

Code:

rangestat (count) v4 = encounter_num, interval(encounter_date -30 -1) by(subject_id) bys subject_id encounter_date (encounter_num): replace v4 = (_n != 1) | (v4 != .) rangestat (count) v5 = encounter_num, interval(encounter_date 2 14) by(subject_id) replace v5 = cond(v4,.,cond(v5 ==.,0,v5))

Note that, there must be different understandings between mine and Professor Clyde's on what you need. The output of my code is quite different from Professor Clyde's in #5 (see some exmples below). A clarification for your desire, thus, is still needed.

PHP Code:

- id 33204, date 09 Feb 2010 (obs 8) should not be qualified to be a target, since there is an encounter on 26 Jan 2010, just 14 days before. - id 33204, date 21 Sep 2010 (obs 18), the counting outcome is only 2 (including encounters in 25 Sep, 27 Sep) - id 33205, date 11 May 2010 (obs 28) should be picked up since the most recent encounter before that is on 27 March 2010, i.e 45 days before.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#11

14 May 2021, 17:08

Romalpa Akzo I, too, initially found the question unclear. But in #4, O.P. shows the results he is looking for. The code in #5, using the example data in #5, does match what is asked for in #4. (FWIW, before I saw #3, my first thought was exactly the code you show in #10, but it produces different results.)
1 like
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#12

14 May 2021, 17:40

Many thanks, professore Clyde, I see your point now. Then I also note that the O.P's description in the starting of #4 seems different from the results he mentioned in the same post. Until now, I still do not understand the mechanism behind the examples that I have mentioned in #10. Kindly instruct me some more about your flow.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#13

14 May 2021, 18:03

I think the basic idea is that O.P. wants to start with the first observation for each subject_id and examine 30 days from there. All the observations in that block, except the first, are designated as var4 = 1, and the first is var4 = 0 because it begins a 30 day block of time. Then, within that 30 day block of time, he wants to identify those observations that occur from 1 to 15 days after the initial one, setting var5 to a count of those. With the first 30 days taken care of, those observations are to be put aside, and a new 30 day period begins with the first observation following that 30 day period (if any). That new 30 day period is then to be handled just as the first one was. Once that is done, those observations are to be put aside, and a new 30 day block begins with the next observation (if any), and so on.

This marking out of the data into 30 day blocks is what the program mark_blocks does, and -runby- simply iterates it over subject_id. The code after -runby- works within those 30 day blocks using simple -by- prefixed commands to calculate var4 and var5 within each block.
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#14

14 May 2021, 21:25

Many thanks, professor Clyde. Your logic is clear to me now and it makes the puzzle interesting. However, I notice it still differs from O.P' output in #4, at least for the case of obs 16 (id 33204, date 15 Jun 2010). Thus, a clarification from the O.P is still needed.

Below is my try to solve for the (interesting) puzzle following your description.

Code:

gen b = encounter_date bys subject_id (encounter_date encounter_num): replace b = b[_n-1] if b < b[_n-1] + 31 & _n>1 bys subject_id b: gen v4 = _n>1 rangestat (count) v5 = encounter_num, interval(encounter_date 2 14) by(subject_id) replace v5 = cond(v4,.,cond(v5 ==.,0,v5))
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#15

14 May 2021, 21:36

Brilliant solution!

And you are right, in #4, encounter_num 16 for id 33204 does not follow the generalization that he, in other respects, seems to want. I suspect he made a mistake when he worked that out by hand.
Comment

Announcement