Dear Statalist,
I am trying to solve a problem with duplicate observations (people) in my sample. I have a column with their first name and a column with their last name. The spelling of the first and the last name of the duplicate observations can differ. By differ I mean that in one case the first name can be spelled ‘BRUCE’ and in the other ‘Bruce’ or ‘bruce’. The same holds for the last name.
To find first how many duplicates I have for the given combination of first and last name I used
. Then I dropped the tagged duplicates. However, when I checked the new list of names there were still some duplicate observations because they could not have been identified as such by Stata. This comes most likely from the fact that Stata does not identify the uppercase spelling of the first name for example as the same when it is lower case or proper. I have been trying to find a way to make the spelling of the names in my list consistent – first letter is capital and the rest is lowercase, but could not come up with a solution. There are those string functions like strupper(s), but there I have to specify the exact string, which means that I have to do it for every first and last name separately. In Excel there is the function ‘proper’ which would solve my problem but I would like to do it in Stata if that is possible. Therefore, I would be extremely grateful if you can give me some suggestions for that. I am using Stata 14.1.
Thank you very much in advance for your help.
Albena
I am trying to solve a problem with duplicate observations (people) in my sample. I have a column with their first name and a column with their last name. The spelling of the first and the last name of the duplicate observations can differ. By differ I mean that in one case the first name can be spelled ‘BRUCE’ and in the other ‘Bruce’ or ‘bruce’. The same holds for the last name.
To find first how many duplicates I have for the given combination of first and last name I used
Code:
duplicates tag Fname Lname, generate(duplicates)
Thank you very much in advance for your help.
Albena
Comment