Handling Duplicates Observation

Mutanen Lau

Join Date: Sep 2016

Posts: 19
#1

Handling Duplicates Observation

29 Apr 2018, 15:33

my data set contains duplicates i want to save the duplicates separately or export it then drop all the duplicates from my data set because i want to work with unique observation. please i need help on how to achieve that.
thanks
Tags: None
Basharat Hussain

Join Date: Apr 2016

Posts: 28
#2

30 Apr 2018, 17:14

Kindly check: https://www.stata.com/support/faqs/d...-observations/
-------------------------------------------------------------------------------------------------------------------------
clear
sysuse auto
save "C:\Data\auto.dta", replace
append using "C:\Data\auto.dta"
save "C:\Data\autox2.dta", replace
bysort make: ge dup = cond(_N==1,0,_n)
count if dup == 2
//you can now 'keep' if dup == 2 or 'drop' if dup == 1
keep if dup == 2
tab dup
save "C:\Data\autoex.dta", replace
help cond
--------------------------------------------------------------------------------------------------------------------------
Comment

Roman Mostazir

Join Date: Apr 2014
Posts: 873

30 Apr 2018, 17:54

Please note you are expected to provide data example using -dataex-, as per the forum rule. Please read through the Forum rules . Assuming you have some sort of unique identification for the observations, here is another way with "egen"..

Code:

*Some fake data

 li id fake in 1/16, clean noobs
    id       fake  
     1   .2047095  
     2   .8927587  
     2   .8927587  
     3   .5844658  
     3   .5844658  
     3   .5844658  
     4   .3697791  
     4   .3697791  
     4   .3697791  
     4   .3697791  
     5   .8506309  
     5   .8506309  
     5   .8506309  
     5   .8506309  
     5   .8506309  
     6   .3913819  

so id
egen tagid = tag(id) //identify unique observations

preserve
keep if tagid==0
save duplicates.dta, replace //data with duplicates are saved in current directory
restore
keep if tagid==1
save unique.dta,  replace //data with unique observations are saved in current directory

use unique.dta, clear

li in 1/10, clean noobs

   id       fake   tagid  
     1   .2047095       1  
     2   .8927587       1  
     3   .5844658       1  
     4   .3697791       1  
     5   .8506309       1  
     6   .3913819       1  
     7   .1196613       1  
     8   .7542434       1  
     9   .6950234       1  
    10   .6866152       1

Roman

Comment

Basharat Hussain

Join Date: Apr 2016

Posts: 28
#4

30 Apr 2018, 19:07

Thank you so much.
Apologies for not having used dataex.
Will read rules. Have been away for too long.
Comment
Roman Mostazir

Join Date: Apr 2014

Posts: 873
#5

01 May 2018, 05:37

Originally posted by Basharat Hussain View Post

Thank you so much.
Apologies for not having used dataex.
Will read rules. Have been away for too long.

Hi Basharat, my reply #3 is targeted to the original poster (Mutanen Lau)and not at you. I think the usual forum practice is that unless and otherwise someone is specifically mentioned in the replies, all replies are responses to the question posted by the original poster.

Roman
Comment
Mutanen Lau

Join Date: Sep 2016

Posts: 19
#6

12 May 2018, 05:14

Sorry for not using dataex,i will take correction next time. I sincerely appreciate your help.
Thanks

Lau
Comment

Announcement

Handling Duplicates Observation

Comment

Comment

Comment

Comment

Comment