Generating episode numbers without having to loop

Andrew Salmon

Join Date: Dec 2014

Posts: 8
#1

Generating episode numbers without having to loop

12 Mar 2015, 09:17

Dear all

I have a set of data in which the same patient may feature 1 or more times (blood tests done over the course of a year). What I want to do is to create a column of episode numbers gives each patient episode a unique number. After much headbanging I have come up with this:

egen id =group(whatever identifies the patients)

and then

egen episodeno =group(id labno) (the laboratory code number for the sample.)

this gets you so far except that doesn't seem possible to go much further using the same method, and the resulting episodeno doesn't cycle back to 1 with each new patient
The way forward would therefore appear to be to generate something like mod(episodeno, smallest value of episodeno for patient n)

STATA doesn't seem to be very keen to let you do things like this (i.e. using [1]) and the only way I could find to do it was

by id : egen episodeno1 =min(episodeno)
gen episodeno2 =mod(episodeno,episodeno1) + 1 (unlikely to cycle back to 1 for the same patient unless they spend the whole year having blood taken)

or alternatively put a -1 in the first of these two statements

Does anyone know a better way to do this? !!

have just scrolled down egen again and found that rank might work... (except it doesnt quite do it..)
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

12 Mar 2015, 10:31

Perhaps something like the following will help.

Code:

egen id =group(whatever identifies the patients) bysort id : egen episodeno =group(labno)

See help by for more details. Note that the episode numbers will be assigned in increasing order by labno.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#3

12 Mar 2015, 10:48

The egen function group() can't be combined with by:.

I can't easily visualize what is wanted here (there are no specific examples of data or intended results), but it sounds like

Code:

bysort patientid (time) : gen episode = sum(indicator_for_ episode)

Code:

Also check out tsspell (SSC)
Comment
Andrew Salmon

Join Date: Dec 2014

Posts: 8
#4

12 Mar 2015, 11:22

hi Nick
Thanks I will try that but am a bit past my sell by date for today

we have something like:

patient id date other data of interest
1 1/1/15 ******
1 2/2/15 **************
2 1/2/15 ____________
3 3/2/15 ---------------------------
3 14/2/15 ++++++++++

and we want to add a column called episode number than is the transpose of e.g. (1 2 1 1 2) according to the above. Unique episode number probably isn't quite what I really meant in that they are allowed to range from 1 to n for each individual patient.

William: as Nick says, egen and by generally don't tolerate each other (it would be a heck of sight easier sometimes if they did because what you suggest would do the job perfectly, but I can sort of see why they don't) except for rare instances like egen min max and rank; i used min to extract the lowest value for each patient.

Last edited by Andrew Salmon; 12 Mar 2015, 11:35. Reason: edited to assimilate all the replies so far
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#5

12 Mar 2015, 11:48

Looks like you want

Code:

bysort patientid (time): gen episode = _n
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#6

12 Mar 2015, 13:02

I'd say that egen and by: are compatible to almost the extent that makes sense. The essential purpose of the group() function is to make distinct identifiers; they would no longer be distinct if that were done separately by something else.

The main exception is that people often want to assign quantile-based categories within groups of some variable, and you need to use user-written egen functions or other code to do that.
Comment
Andrew Salmon

Join Date: Dec 2014

Posts: 8
#7

13 Mar 2015, 03:15

Originally posted by Robert Picard View Post

Looks like you want

Code:

bysort patientid (time): gen episode = _n

Hi Robert
thanks for this. Bysort with patient id in this way would probably usually work, but this particular dataset has one further complication which is that because the clinical details of the patient are sometimes spread over additional lines(which are otherwise blank i have assumed until i check that it is safe to append the id number above) the _n command simply assigns the notquiteblank line a fresh number. Probably the thing to do then is to add the clinical details strings together using [_n+1] and then drop the extra lines out, since only a handful of episodes run over more than 2 lines and the useful stuff will be in the first 2 if at all. If you try bysort with more than 1 variable however, it fails since it gives almost everything a 1.

thanks everyone for their replies
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35754
#8

13 Mar 2015, 03:23

andyfish71: Please re-register with a full real name. See FAQ Advice Section 6. You can use the Contact Us button at bottom right to email the list administrators.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#9

13 Mar 2015, 08:36

but this particular dataset has one further complication which is that because the clinical details of the patient are sometimes spread over additional lines(which are otherwise blank i have assumed until i check that it is safe to append the id number above)

Stata is not a spreadsheet. All of Stata's operations are designed to handle a data set in which each "row" of the data set is a separate observation. While there are -wide- and -long- layouts for repetitive data, what you describe is nothing but a recipe for trouble. You need to resolve that first: until you do, it will be very difficult to impossible to work with the data in Stata.
1 like
Comment
Andrew Salmon

Join Date: Dec 2014

Posts: 8
#10

21 Apr 2015, 05:27

Yep too right!
Comment

Announcement

Generating episode numbers without having to loop

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment