Consistent spelling of names in a list - string functions

Albena Sotirova

Join Date: Jan 2015

Posts: 11
#1

Consistent spelling of names in a list - string functions

11 Feb 2016, 03:07

Dear Statalist,

I am trying to solve a problem with duplicate observations (people) in my sample. I have a column with their first name and a column with their last name. The spelling of the first and the last name of the duplicate observations can differ. By differ I mean that in one case the first name can be spelled ‘BRUCE’ and in the other ‘Bruce’ or ‘bruce’. The same holds for the last name.

To find first how many duplicates I have for the given combination of first and last name I used

Code:

duplicates tag Fname Lname, generate(duplicates)

. Then I dropped the tagged duplicates. However, when I checked the new list of names there were still some duplicate observations because they could not have been identified as such by Stata. This comes most likely from the fact that Stata does not identify the uppercase spelling of the first name for example as the same when it is lower case or proper. I have been trying to find a way to make the spelling of the names in my list consistent – first letter is capital and the rest is lowercase, but could not come up with a solution. There are those string functions like strupper(s), but there I have to specify the exact string, which means that I have to do it for every first and last name separately. In Excel there is the function ‘proper’ which would solve my problem but I would like to do it in Stata if that is possible. Therefore, I would be extremely grateful if you can give me some suggestions for that. I am using Stata 14.1.

Thank you very much in advance for your help.

Albena
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4399
#2

11 Feb 2016, 03:13

You could try something like

Code:

generate str full_name = trim(itrim(strlower(Fname) + " " + strlower(Lname))) duplicates tag full_name, generate(duplicates)

Is the Excel function 'proper' similar to Stata's strproper()?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#3

11 Feb 2016, 03:29

Just to be clear: duplicates has one and one idea only of what is a duplicate, namely exact identity of stored values. It has precisely no idea of identical meaning, import or essence or of identifying what to people are evidently different versions of the same thing. So, as Joseph's answer implies, you have to do all the work of translating to a common form.
Comment
Albena Sotirova

Join Date: Jan 2015

Posts: 11
#4

11 Feb 2016, 03:53

Thank you Joseph and Nick! I tried your code Joseph and it works. The function 'proper' in Excel is similar to the strproper() in Stata. However, in Excel you would just type for example

Code:

PROPER(B2)

and then the formula can be automatically applied to the other cells in the same column.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35651
#5

11 Feb 2016, 05:41

In Stata too putting proper() around a variable name (or indeed an expression) applies it generally.
Comment

Announcement

Consistent spelling of names in a list - string functions

Comment

Comment

Comment

Comment