Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Consistently sorting data ahead of generating duplicates

    Dear community,

    I am currently trying to identify different individuals(across several years) within a dataset, whi have been given the same identifyer.
    To do this I wanted to generate two variables identifying duplicates in terms of:
    1. the ID used
    and
    2. the ID in combination with sex and birthday

    sort person_id
    quietly by person_id : gen dupIDLT = cond(_N==1,0,_n)

    sort person_id person_id birthday sex
    quietly by person_id birthday sex: gen dupLT = cond(_N==1,0,_n)

    However, when generating these there may be 3 dupicates each, but dupIDLT may be numbered 1,2,3 while dupLT is numbered 1,3,2 for the observations in years 2005-2007.

    How can I achieve that both are numbered 1,2,3?

    Best wishes,
    Jil

  • #2
    If I understood right, the commands - duplicates list - and - duplicates tag - can tackle this issue.
    Best regards,

    Marcos

    Comment


    • #3
      Since you generate one of the duplicate variables before the other, you can impose the order of the first variable on the second

      Code:
      quietly by person_id birthday sex (dupIDLT): gen dupLT = cond(_N==1,0,_n)

      Also, better to reverse the order to guarantee what you want as the second variable has more variables defining a group.

      Code:
      quietly by person_id birthday sex: gen dupLT = cond(_N==1,0,_n)
      quietly by person_id (dupLT) : gen dupIDLT = cond(_N==1,0,_n)
      Last edited by Andrew Musau; 06 Nov 2019, 04:35.

      Comment


      • #4
        Dear Andrew,

        thanks for your quick reply, with your help I could sort my issues out.
        Have a great day :D

        Comment

        Working...
        X