chimchar available from SSC: thoroughly clean string variables

Tommy Morgan

Join Date: Aug 2022

Posts: 4
#1

chimchar available from SSC: thoroughly clean string variables

15 Mar 2023, 11:34

Shoutouts to Kit Baum!

chimchar arose as a solution to several separate projects I've worked on as a research assistant and as a student. It cleans string variables by changing accented or otherwise non-standard letters into their base form letter, i.e. "š" becomes "s". It then removes some set of special characters and numbers based on the option you select.

First, in the Record Linking Lab, we use reclink a lot to link people across censuses, but it has a lot of trouble with the special characters that sometimes sneak into indexed names, and especially with grave accents and parentheses. So chimchar can take a string variable meant to contain nothing but letters and turn it into just letters with its numremove option.

Then, I have a project dealing with school names across the country, so I wanted to clean that data of its special characters and spaces without dropping the numbers in the school names. So you can do that as well using the numokay option.

Lastly, if you just want to clean a string variable that's meant to be a numeric variable as prep for a destring, then you can run this command with its numonly option.

There's also an option that switches decimal commas to decimal points and vice versa prior to whatever option you run for datasets that use the decimal comma.

Thanks!
Tags: None
Chen Samulsion

Join Date: Jan 2018

Posts: 927
#2

16 Mar 2023, 07:58

Thank you for the useful program. Those who are interested in string cleaning could give an attention to -cleanchars- written by Lars Ängquist.
Comment

Announcement

chimchar available from SSC: thoroughly clean string variables

Comment