Hello,
I use this command for removing duplicates:
sort postcal code, gender, birth year, species, type, date
quietly by postcal code, gender, birth year, species, type: gen dup2 = cond(_N==1,0,_n)
tabulate dup2
drop if dup2 ==2
drop if dup2==3
I want to remove the duplicate with the last date (therefore I sorted also on date).
If one of these variables has empty cells for both subjects of a duplicate pair (and all other values of the variables above are the same), will these subjects then also become duplicates or only when it has the same value and both cells are not empty?
If empty cells are also included as duplicate values, is there a way how I can prevent this?
Kind regards,
Karuna Vendrik
I use this command for removing duplicates:
sort postcal code, gender, birth year, species, type, date
quietly by postcal code, gender, birth year, species, type: gen dup2 = cond(_N==1,0,_n)
tabulate dup2
drop if dup2 ==2
drop if dup2==3
I want to remove the duplicate with the last date (therefore I sorted also on date).
If one of these variables has empty cells for both subjects of a duplicate pair (and all other values of the variables above are the same), will these subjects then also become duplicates or only when it has the same value and both cells are not empty?
If empty cells are also included as duplicate values, is there a way how I can prevent this?
Kind regards,
Karuna Vendrik
Comment