Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling Duplicates Observation

    my data set contains duplicates i want to save the duplicates separately or export it then drop all the duplicates from my data set because i want to work with unique observation. please i need help on how to achieve that.
    thanks

  • #2
    Kindly check: https://www.stata.com/support/faqs/d...-observations/
    -------------------------------------------------------------------------------------------------------------------------
    clear
    sysuse auto
    save "C:\Data\auto.dta", replace
    append using "C:\Data\auto.dta"
    save "C:\Data\autox2.dta", replace
    bysort make: ge dup = cond(_N==1,0,_n)
    count if dup == 2
    //you can now 'keep' if dup == 2 or 'drop' if dup == 1
    keep if dup == 2
    tab dup
    save "C:\Data\autoex.dta", replace
    help cond
    --------------------------------------------------------------------------------------------------------------------------

    Comment


    • #3
      Please note you are expected to provide data example using -dataex-, as per the forum rule. Please read through the Forum rules . Assuming you have some sort of unique identification for the observations, here is another way with "egen"..

      Code:
      *Some fake data
      
       li id fake in 1/16, clean noobs
          id       fake  
           1   .2047095  
           2   .8927587  
           2   .8927587  
           3   .5844658  
           3   .5844658  
           3   .5844658  
           4   .3697791  
           4   .3697791  
           4   .3697791  
           4   .3697791  
           5   .8506309  
           5   .8506309  
           5   .8506309  
           5   .8506309  
           5   .8506309  
           6   .3913819  
      
      so id
      egen tagid = tag(id) //identify unique observations
      
      preserve
      keep if tagid==0
      save duplicates.dta, replace //data with duplicates are saved in current directory
      restore
      keep if tagid==1
      save unique.dta,  replace //data with unique observations are saved in current directory
      
      use unique.dta, clear
      
      li in 1/10, clean noobs
      
         id       fake   tagid  
           1   .2047095       1  
           2   .8927587       1  
           3   .5844658       1  
           4   .3697791       1  
           5   .8506309       1  
           6   .3913819       1  
           7   .1196613       1  
           8   .7542434       1  
           9   .6950234       1  
          10   .6866152       1

      Roman

      Comment


      • #4
        Thank you so much.
        Apologies for not having used dataex.
        Will read rules. Have been away for too long.

        Comment


        • #5
          Originally posted by Basharat Hussain View Post
          Thank you so much.
          Apologies for not having used dataex.
          Will read rules. Have been away for too long.
          Hi Basharat, my reply #3 is targeted to the original poster (Mutanen Lau)and not at you. I think the usual forum practice is that unless and otherwise someone is specifically mentioned in the replies, all replies are responses to the question posted by the original poster.
          Roman

          Comment


          • #6
            Sorry for not using dataex,i will take correction next time. I sincerely appreciate your help.
            Thanks

            Lau

            Comment

            Working...
            X