Shoutouts to Kit Baum!
chimchar arose as a solution to several separate projects I've worked on as a research assistant and as a student. It cleans string variables by changing accented or otherwise non-standard letters into their base form letter, i.e. "š" becomes "s". It then removes some set of special characters and numbers based on the option you select.
First, in the Record Linking Lab, we use reclink a lot to link people across censuses, but it has a lot of trouble with the special characters that sometimes sneak into indexed names, and especially with grave accents and parentheses. So chimchar can take a string variable meant to contain nothing but letters and turn it into just letters with its numremove option.
Then, I have a project dealing with school names across the country, so I wanted to clean that data of its special characters and spaces without dropping the numbers in the school names. So you can do that as well using the numokay option.
Lastly, if you just want to clean a string variable that's meant to be a numeric variable as prep for a destring, then you can run this command with its numonly option.
There's also an option that switches decimal commas to decimal points and vice versa prior to whatever option you run for datasets that use the decimal comma.
Thanks!
chimchar arose as a solution to several separate projects I've worked on as a research assistant and as a student. It cleans string variables by changing accented or otherwise non-standard letters into their base form letter, i.e. "š" becomes "s". It then removes some set of special characters and numbers based on the option you select.
First, in the Record Linking Lab, we use reclink a lot to link people across censuses, but it has a lot of trouble with the special characters that sometimes sneak into indexed names, and especially with grave accents and parentheses. So chimchar can take a string variable meant to contain nothing but letters and turn it into just letters with its numremove option.
Then, I have a project dealing with school names across the country, so I wanted to clean that data of its special characters and spaces without dropping the numbers in the school names. So you can do that as well using the numokay option.
Lastly, if you just want to clean a string variable that's meant to be a numeric variable as prep for a destring, then you can run this command with its numonly option.
There's also an option that switches decimal commas to decimal points and vice versa prior to whatever option you run for datasets that use the decimal comma.
Thanks!
Comment