Removing duplicates

Isabel Ruthotto

Join Date: Feb 2021

Posts: 10
#1

Removing duplicates

03 Jan 2024, 08:41

Hi,

I can't figure out what I'm doing wrong.

I have 100+ duplicates (people filled out the survey multiple times) in my dataset. The variable that indicates the duplicate is called "ID" and is a string variable. I inspected all duplicates, and I want to remove either the first or second occurrence. I worked out the following code but I keep getting the error "too few quotes".

Here is my code:

sort ID
quietly by ID: gen dup = _n if _N>1. //labels the occurrence

local ID1 "0372552" "0392169" "0414180" "0415160" "0421180" "1266329" "1447648" "1450152" "1501119"

gen to_drop=0
foreach e of local ID1 {
replace to_drop = 1 if EMPLID == " `e' " & dup == 2
}

I also tried: foreach e in ID1 - the code runs but doesn't correctly identify the strings. My to_drop variable only contains zeros.

Any ideas?

Thanks!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35746
#2

03 Jan 2024, 08:55

The route you've chosen obliges you to go further.

Code:

local ID1 `" "0372552" "0392169" "0414180" "0415160" "0421180" "1266329" "1447648" "1450152" "1501119" "'

Stata Rule: The outermost quotation marks are taken to be string delimiters and are stripped on reading.

That is fine with

Code:

local beasts "frog toad newt"

whenever you want frog toad newt to be the contents of the macro and are happy that the double quotes disappear into the dust.

That is not fine whenever you want double quotes to be preserved within strings.

Yet the internal quotes in your case look redundant, so you can go the other way. as

Code:

local ID1 0372552 0392169 0414180 0415160 0421180 1266329 1447648 1450152 1501119

looks good.
Comment
Inaamul Haq

Join Date: Feb 2019

Posts: 57
#3

03 Jan 2024, 08:58

I am not sure, but wondering if a simple code like

Code:

duplicates drop ID, force

will work.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35746
#4

03 Jan 2024, 09:15

Indeed. As its putative author, I forgot to ask why duplicates isn't a solution here.
Comment

Announcement

Removing duplicates

Comment

Comment

Comment