Is it possible to tag duplicates sequentially?

Praneetha Yannam

Join Date: Sep 2022

Posts: 8
#1

Is it possible to tag duplicates sequentially?

19 Apr 2023, 19:40

Hi Everyone! My data looks like below and I request for any syntax that helps to tag the duplicates sequentially starting from 1, I have highlighted the identified duplicates in a small portion of data. And in the second section, I added a new column where I want the tag for duplicates in a sequential order starting from 1. Thankyou so much in advance!!

* Example generated by -dataex-. For more info, type help dataex
clear
input int(STATEID DISTID PSUID HHID2005 HHID2012 HHSPLITID2005)
2 5 4 1 11 1
2 5 4 1 12 2
2 5 4 2 21 1
2 5 4 2 21 1
2 5 4 3 31 1
2 5 4 4 41 1
2 5 4 5 51 1
2 5 4 6 61 1
2 5 4 6 61 1
2 5 4 6 62 2
2 5 4 7 71 1
2 5 4 8 81 1
2 5 4 8 82 2
2 5 4 8 83 3
2 5 4 9 91 1
2 5 4 10 101 1
2 5 4 11 111 1
2 5 4 12 121 1
2 5 4 13 131 1
2 5 4 14 141 1
2 5 4 14 141 1
2 5 4 15 151 1
2 5 4 15 151 1
2 5 4 15 151 1
2 5 4 15 151 1
2 5 4 16 161 1
2 5 4 16 161 1
2 5 4 17 171 1
2 5 4 17 172 2
2 5 4 18 181 1
2 5 4 19 191 1
2 5 4 20 201 1
2 5 4 20 202 2
2 5 4 21 211 1
2 5 4 23 231 1
2 5 4 23 232 2
2 5 5 1 11 1
2 5 5 2 21 1
2 5 5 2 22 2
2 5 5 3 31 1
2 5 5 4 41 1
2 5 5 5 51 1
2 5 5 5 52 2
2 5 5 6 61 1
2 5 5 7 71 1
2 5 5 8 81 1
2 5 5 9 91 1

end

Wanted an Additional Data column

* Example generated by -dataex-. For more info, type help dataex
clear
input int(STATEID DISTID PSUID HHID2005 HHID2012 HHSPLITID2005) Tag
2 5 4 1 11 1
2 5 4 1 12 2
2 5 4 2 21 1 1
2 5 4 2 21 1 2
2 5 4 3 31 1
2 5 4 4 41 1
2 5 4 5 51 1
2 5 4 6 61 1 1
2 5 4 6 61 1 2
2 5 4 6 62 2
2 5 4 7 71 1
2 5 4 8 81 1
2 5 4 8 82 2
2 5 4 8 83 3
2 5 4 9 91 1
2 5 4 10 101 1
2 5 4 11 111 1
2 5 4 12 121 1
2 5 4 13 131 1
2 5 4 14 141 1
2 5 4 14 141 1
2 5 4 15 151 1 1
2 5 4 15 151 1 2
2 5 4 15 151 1 3
2 5 4 15 151 1 4
2 5 4 16 161 1
2 5 4 16 161 1
2 5 4 17 171 1
2 5 4 17 172 2
2 5 4 18 181 1
2 5 4 19 191 1
2 5 4 20 201 1
2 5 4 20 202 2
2 5 4 21 211 1
2 5 4 23 231 1
2 5 4 23 232 2
2 5 5 1 11 1
2 5 5 2 21 1
2 5 5 2 22 2
2 5 5 3 31 1
2 5 5 4 41 1
2 5 5 5 51 1
2 5 5 5 52 2
2 5 5 6 61 1
2 5 5 7 71 1
2 5 5 8 81 1
2 5 5 9 91 1

end
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2425
#2

19 Apr 2023, 20:04

You didn't say what variables might determine the *order* within sets of duplicate observations. That is a problem, since the sorting on other variables that is necessary for what you want might change the existing order among duplicates in a way you don't like. That is a solvable probem, but not without knowing the variable that determines that order within sets. Ignoring that issue, this will do what you want:

Code:

bysort STATEID DISTID PSUID HHID2005 HHID2012 HHSPLITID2005: gen int OrderAmongDuplicates = _n

To learn about these commands, see: -help bysort- and -help _n-
1 like
Comment
Praneetha Yannam

Join Date: Sep 2022

Posts: 8
#3

19 Apr 2023, 20:14

Thanks alot Mike Lacy. That worked perfectly. I should have posted my query early in the day, since I had wasted so much time doing the hard coding :D. Thanks a lot!
Comment

Announcement

Is it possible to tag duplicates sequentially?

Comment

Comment