egen rank without skipping rankings at ties

Amira hasenbush

Join Date: Jun 2015
Posts: 3

egen rank without skipping rankings at ties

01 Jun 2015, 18:31

Hi,

I am working with a very large data set (approximately 45 variables for over 342,000 observations). Each observation represents "contact" with an agency - and each contact is related to an "event" that has occurred. One event may have multiple contacts, but each contact can only be related to one event. These contacts and events occurred over time to approximately 3,500 people. Each contact/observation is labeled with an id number that is unique to the person who is involved in the contact/event. I am trying to generate a unique event number for each event, counted at the person level. For example, I would like to create something that looks like this:

id	pcn (Personal Contact No.)	cyd (date of contact)	doe (date of event)	eno (Event #)
1	1	1/2/2002	1/1/2002	1
1	2	1/2/2002	1/1/2002	1
1	3	3/6/2003	3/4/2003	2
1	4	3/7/2003	3/4/2003	2
1	5	4/12/2003	3/4/2003	2
1	6	9/7/2014	8/7/2013	3

My original data have the id, cyd and doe. I created the pcn by using the command: "bysort id: egen pcn = rank(cyd), unique"

I am trying to generate the event number using the id and doe, but when I use "bysort id: egen eno = rank(-doe), field" I get skips in the count if there is a tie. So, instead of it being 1, 1, 2, 2, 2, 3 above, it's 1, 1, 3, 3, 3, 6. Is there a way to get it to count sequentially even if there is a tie?

Thanks,
Amira Hasenbush

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

01 Jun 2015, 18:42

Try this:

Code:

by id (doe), sort: gen eno = 1 if _n == 1 by id: replace eno = sum(eno)

Note: This will restart eno at 1 with each new id. I believe that's what you want, but I'm not 100% certain.
Comment
Amira hasenbush

Join Date: Jun 2015

Posts: 3
#3

01 Jun 2015, 19:27

Hi, thanks for your response. Yes, I do want to restart eno at 1 with each new id. Unfortunately, that just gave me a variable with every observation as a "1".
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

01 Jun 2015, 21:43

Sorry, my mistake. The mention of doe in the first -by- should not have parentheses:

Code:

set more off
clear*

input id_pcn  str10 _cyd  str10 _doe 
1 "1/2/2002" "1/1/2002"
1 "1/2/2002" "1/1/2002"
1 "3/6/2003" "3/4/2003"
1 "3/7/2003" "3/4/2003"
1 "4/12/2003" "3/4/2003"
1 "9/7/2014" "8/7/2013"
end

gen cyd = date(_cyd, "MDY")
format cyd %td
gen doe = date(_doe, "MDY")
format doe %td
drop _cyd _doe

list // RE-CREATES DATA FROM POST

// CORRECTED CODE FOR eno
by id doe, sort: gen eno = 1 if _n == 1
by id: replace eno = sum(eno)

list

Comment

Amira hasenbush

Join Date: Jun 2015

Posts: 3
#5

02 Jun 2015, 13:59

That worked! Thank you!
Comment

Announcement

egen rank without skipping rankings at ties

Comment

Comment

Comment

Comment