Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating episode numbers without having to loop

    Dear all

    I have a set of data in which the same patient may feature 1 or more times (blood tests done over the course of a year). What I want to do is to create a column of episode numbers gives each patient episode a unique number. After much headbanging I have come up with this:

    egen id =group(whatever identifies the patients)

    and then

    egen episodeno =group(id labno) (the laboratory code number for the sample.)

    this gets you so far except that doesn't seem possible to go much further using the same method, and the resulting episodeno doesn't cycle back to 1 with each new patient
    The way forward would therefore appear to be to generate something like mod(episodeno, smallest value of episodeno for patient n)

    STATA doesn't seem to be very keen to let you do things like this (i.e. using [1]) and the only way I could find to do it was

    by id : egen episodeno1 =min(episodeno)
    gen episodeno2 =mod(episodeno,episodeno1) + 1
    (unlikely to cycle back to 1 for the same patient unless they spend the whole year having blood taken)

    or alternatively put a -1 in the first of these two statements

    Does anyone know a better way to do this? !!

    have just scrolled down egen again and found that rank might work... (except it doesnt quite do it..)








  • #2
    Perhaps something like the following will help.
    Code:
    egen id =group(whatever identifies the patients)
    bysort id : egen episodeno =group(labno)
    See help by for more details. Note that the episode numbers will be assigned in increasing order by labno.

    Comment


    • #3
      The egen function group() can't be combined with by:.

      I can't easily visualize what is wanted here (there are no specific examples of data or intended results), but it sounds like

      Code:
       
      bysort patientid (time) : gen episode = sum(indicator_for_ episode)
      Code:
      
      
      Also check out tsspell (SSC)


      Comment


      • #4
        hi Nick
        Thanks I will try that but am a bit past my sell by date for today

        we have something like:

        patient id date other data of interest
        1 1/1/15 ******
        1 2/2/15 **************
        2 1/2/15 ____________
        3 3/2/15 ---------------------------
        3 14/2/15 ++++++++++

        and we want to add a column called episode number than is the transpose of e.g. (1 2 1 1 2) according to the above. Unique episode number probably isn't quite what I really meant in that they are allowed to range from 1 to n for each individual patient.

        William: as Nick says, egen and by generally don't tolerate each other (it would be a heck of sight easier sometimes if they did because what you suggest would do the job perfectly, but I can sort of see why they don't) except for rare instances like egen min max and rank; i used min to extract the lowest value for each patient.
        Last edited by Andrew Salmon; 12 Mar 2015, 11:35. Reason: edited to assimilate all the replies so far

        Comment


        • #5
          Looks like you want

          Code:
          bysort patientid (time): gen episode = _n

          Comment


          • #6
            I'd say that egen and by: are compatible to almost the extent that makes sense. The essential purpose of the group() function is to make distinct identifiers; they would no longer be distinct if that were done separately by something else.

            The main exception is that people often want to assign quantile-based categories within groups of some variable, and you need to use user-written egen functions or other code to do that.

            Comment


            • #7
              Originally posted by Robert Picard View Post
              Looks like you want

              Code:
              bysort patientid (time): gen episode = _n
              Hi Robert
              thanks for this. Bysort with patient id in this way would probably usually work, but this particular dataset has one further complication which is that because the clinical details of the patient are sometimes spread over additional lines(which are otherwise blank i have assumed until i check that it is safe to append the id number above) the _n command simply assigns the notquiteblank line a fresh number. Probably the thing to do then is to add the clinical details strings together using [_n+1] and then drop the extra lines out, since only a handful of episodes run over more than 2 lines and the useful stuff will be in the first 2 if at all. If you try bysort with more than 1 variable however, it fails since it gives almost everything a 1.

              thanks everyone for their replies

              Comment


              • #8
                andyfish71: Please re-register with a full real name. See FAQ Advice Section 6. You can use the Contact Us button at bottom right to email the list administrators.

                Comment


                • #9
                  but this particular dataset has one further complication which is that because the clinical details of the patient are sometimes spread over additional lines(which are otherwise blank i have assumed until i check that it is safe to append the id number above)
                  Stata is not a spreadsheet. All of Stata's operations are designed to handle a data set in which each "row" of the data set is a separate observation. While there are -wide- and -long- layouts for repetitive data, what you describe is nothing but a recipe for trouble. You need to resolve that first: until you do, it will be very difficult to impossible to work with the data in Stata.

                  Comment


                  • #10
                    Yep too right!

                    Comment

                    Working...
                    X