Consistently sorting data ahead of generating duplicates

Jil Moss

Join Date: Sep 2019

Posts: 4
#1

Consistently sorting data ahead of generating duplicates

06 Nov 2019, 04:09

Dear community,

I am currently trying to identify different individuals(across several years) within a dataset, whi have been given the same identifyer.
To do this I wanted to generate two variables identifying duplicates in terms of:
1. the ID used
and
2. the ID in combination with sex and birthday

sort person_id
quietly by person_id : gen dupIDLT = cond(_N==1,0,_n)

sort person_id person_id birthday sex
quietly by person_id birthday sex: gen dupLT = cond(_N==1,0,_n)

However, when generating these there may be 3 dupicates each, but dupIDLT may be numbered 1,2,3 while dupLT is numbered 1,3,2 for the observations in years 2005-2007.

How can I achieve that both are numbered 1,2,3?

Best wishes,
Jil
Tags: duplicates, panel data
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

06 Nov 2019, 04:24

If I understood right, the commands - duplicates list - and - duplicates tag - can tackle this issue.

Best regards,

Marcos
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10216
#3

06 Nov 2019, 04:25

Since you generate one of the duplicate variables before the other, you can impose the order of the first variable on the second

Code:

quietly by person_id birthday sex (dupIDLT): gen dupLT = cond(_N==1,0,_n)

Also, better to reverse the order to guarantee what you want as the second variable has more variables defining a group.

Code:

quietly by person_id birthday sex: gen dupLT = cond(_N==1,0,_n) quietly by person_id (dupLT) : gen dupIDLT = cond(_N==1,0,_n)

Last edited by Andrew Musau; 06 Nov 2019, 04:35.
Comment
Jil Moss

Join Date: Sep 2019

Posts: 4
#4

06 Nov 2019, 07:15

Dear Andrew,

thanks for your quick reply, with your help I could sort my issues out.
Have a great day :D
Comment

Announcement

Consistently sorting data ahead of generating duplicates

Comment

Comment

Comment