Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen rank without skipping rankings at ties

    Hi,

    I am working with a very large data set (approximately 45 variables for over 342,000 observations). Each observation represents "contact" with an agency - and each contact is related to an "event" that has occurred. One event may have multiple contacts, but each contact can only be related to one event. These contacts and events occurred over time to approximately 3,500 people. Each contact/observation is labeled with an id number that is unique to the person who is involved in the contact/event. I am trying to generate a unique event number for each event, counted at the person level. For example, I would like to create something that looks like this:
    id pcn (Personal Contact No.) cyd (date of contact) doe (date of event) eno (Event #)
    1 1 1/2/2002 1/1/2002 1
    1 2 1/2/2002 1/1/2002 1
    1 3 3/6/2003 3/4/2003 2
    1 4 3/7/2003 3/4/2003 2
    1 5 4/12/2003 3/4/2003 2
    1 6 9/7/2014 8/7/2013 3
    My original data have the id, cyd and doe. I created the pcn by using the command: "bysort id: egen pcn = rank(cyd), unique"

    I am trying to generate the event number using the id and doe, but when I use "bysort id: egen eno = rank(-doe), field" I get skips in the count if there is a tie. So, instead of it being 1, 1, 2, 2, 2, 3 above, it's 1, 1, 3, 3, 3, 6. Is there a way to get it to count sequentially even if there is a tie?

    Thanks,
    Amira Hasenbush

  • #2
    Try this:

    Code:
    by id (doe), sort: gen eno = 1 if _n == 1
    by id: replace eno = sum(eno)
    Note: This will restart eno at 1 with each new id. I believe that's what you want, but I'm not 100% certain.

    Comment


    • #3
      Hi, thanks for your response. Yes, I do want to restart eno at 1 with each new id. Unfortunately, that just gave me a variable with every observation as a "1".

      Comment


      • #4
        Sorry, my mistake. The mention of doe in the first -by- should not have parentheses:

        Code:
        set more off
        clear*
        
        input id_pcn  str10 _cyd  str10 _doe 
        1 "1/2/2002" "1/1/2002"
        1 "1/2/2002" "1/1/2002"
        1 "3/6/2003" "3/4/2003"
        1 "3/7/2003" "3/4/2003"
        1 "4/12/2003" "3/4/2003"
        1 "9/7/2014" "8/7/2013"
        end
        
        gen cyd = date(_cyd, "MDY")
        format cyd %td
        gen doe = date(_doe, "MDY")
        format doe %td
        drop _cyd _doe
        
        list // RE-CREATES DATA FROM POST
        
        // CORRECTED CODE FOR eno
        by id doe, sort: gen eno = 1 if _n == 1
        by id: replace eno = sum(eno)
        
        list

        Comment


        • #5
          That worked! Thank you!

          Comment

          Working...
          X