Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is it possible to tag duplicates sequentially?

    Hi Everyone! My data looks like below and I request for any syntax that helps to tag the duplicates sequentially starting from 1, I have highlighted the identified duplicates in a small portion of data. And in the second section, I added a new column where I want the tag for duplicates in a sequential order starting from 1. Thankyou so much in advance!!


    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(STATEID DISTID PSUID HHID2005 HHID2012 HHSPLITID2005)
    2 5 4 1 11 1
    2 5 4 1 12 2
    2 5 4 2 21 1
    2 5 4 2 21 1

    2 5 4 3 31 1
    2 5 4 4 41 1
    2 5 4 5 51 1
    2 5 4 6 61 1
    2 5 4 6 61 1

    2 5 4 6 62 2
    2 5 4 7 71 1
    2 5 4 8 81 1
    2 5 4 8 82 2
    2 5 4 8 83 3
    2 5 4 9 91 1
    2 5 4 10 101 1
    2 5 4 11 111 1
    2 5 4 12 121 1
    2 5 4 13 131 1
    2 5 4 14 141 1
    2 5 4 14 141 1
    2 5 4 15 151 1
    2 5 4 15 151 1
    2 5 4 15 151 1
    2 5 4 15 151 1

    2 5 4 16 161 1
    2 5 4 16 161 1
    2 5 4 17 171 1
    2 5 4 17 172 2
    2 5 4 18 181 1
    2 5 4 19 191 1
    2 5 4 20 201 1
    2 5 4 20 202 2
    2 5 4 21 211 1
    2 5 4 23 231 1
    2 5 4 23 232 2
    2 5 5 1 11 1
    2 5 5 2 21 1
    2 5 5 2 22 2
    2 5 5 3 31 1
    2 5 5 4 41 1
    2 5 5 5 51 1
    2 5 5 5 52 2
    2 5 5 6 61 1
    2 5 5 7 71 1
    2 5 5 8 81 1
    2 5 5 9 91 1

    end

    Wanted an Additional Data column

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(STATEID DISTID PSUID HHID2005 HHID2012 HHSPLITID2005) Tag
    2 5 4 1 11 1
    2 5 4 1 12 2
    2 5 4 2 21 1 1
    2 5 4 2 21 1 2
    2 5 4 3 31 1
    2 5 4 4 41 1
    2 5 4 5 51 1
    2 5 4 6 61 1 1
    2 5 4 6 61 1 2

    2 5 4 6 62 2
    2 5 4 7 71 1
    2 5 4 8 81 1
    2 5 4 8 82 2
    2 5 4 8 83 3
    2 5 4 9 91 1
    2 5 4 10 101 1
    2 5 4 11 111 1
    2 5 4 12 121 1
    2 5 4 13 131 1
    2 5 4 14 141 1
    2 5 4 14 141 1
    2 5 4 15 151 1 1
    2 5 4 15 151 1 2
    2 5 4 15 151 1 3
    2 5 4 15 151 1 4

    2 5 4 16 161 1
    2 5 4 16 161 1
    2 5 4 17 171 1
    2 5 4 17 172 2
    2 5 4 18 181 1
    2 5 4 19 191 1
    2 5 4 20 201 1
    2 5 4 20 202 2
    2 5 4 21 211 1
    2 5 4 23 231 1
    2 5 4 23 232 2
    2 5 5 1 11 1
    2 5 5 2 21 1
    2 5 5 2 22 2
    2 5 5 3 31 1
    2 5 5 4 41 1
    2 5 5 5 51 1
    2 5 5 5 52 2
    2 5 5 6 61 1
    2 5 5 7 71 1
    2 5 5 8 81 1
    2 5 5 9 91 1

    end

  • #2
    You didn't say what variables might determine the *order* within sets of duplicate observations. That is a problem, since the sorting on other variables that is necessary for what you want might change the existing order among duplicates in a way you don't like. That is a solvable probem, but not without knowing the variable that determines that order within sets. Ignoring that issue, this will do what you want:

    Code:
    bysort STATEID DISTID PSUID HHID2005 HHID2012 HHSPLITID2005: gen int OrderAmongDuplicates =  _n
    To learn about these commands, see: -help bysort- and -help _n-

    Comment


    • #3
      Thanks alot Mike Lacy. That worked perfectly. I should have posted my query early in the day, since I had wasted so much time doing the hard coding :D. Thanks a lot!

      Comment

      Working...
      X