Hi,
I can't figure out what I'm doing wrong.
I have 100+ duplicates (people filled out the survey multiple times) in my dataset. The variable that indicates the duplicate is called "ID" and is a string variable. I inspected all duplicates, and I want to remove either the first or second occurrence. I worked out the following code but I keep getting the error "too few quotes".
Here is my code:
sort ID
quietly by ID: gen dup = _n if _N>1. //labels the occurrence
local ID1 "0372552" "0392169" "0414180" "0415160" "0421180" "1266329" "1447648" "1450152" "1501119"
gen to_drop=0
foreach e of local ID1 {
replace to_drop = 1 if EMPLID == " `e' " & dup == 2
}
I also tried: foreach e in ID1 - the code runs but doesn't correctly identify the strings. My to_drop variable only contains zeros.
Any ideas?
Thanks!
I can't figure out what I'm doing wrong.
I have 100+ duplicates (people filled out the survey multiple times) in my dataset. The variable that indicates the duplicate is called "ID" and is a string variable. I inspected all duplicates, and I want to remove either the first or second occurrence. I worked out the following code but I keep getting the error "too few quotes".
Here is my code:
sort ID
quietly by ID: gen dup = _n if _N>1. //labels the occurrence
local ID1 "0372552" "0392169" "0414180" "0415160" "0421180" "1266329" "1447648" "1450152" "1501119"
gen to_drop=0
foreach e of local ID1 {
replace to_drop = 1 if EMPLID == " `e' " & dup == 2
}
I also tried: foreach e in ID1 - the code runs but doesn't correctly identify the strings. My to_drop variable only contains zeros.
Any ideas?
Thanks!
Comment