Remove accents marks on a string variable in stata

Klaudia Erhardt

Join Date: Mar 2015

Posts: 74
#16

08 Jun 2015, 07:46

Ok, I see.

As unicode has so many code numbers, with Stata 14 I'd try for problems like the one stated by Eric in #1 to use the code of my post #10 with alterations:
- uchar instead of char
- And, as a try, forvalues i=1(1)1000 to see if the characters I want to replace are there.

So, in Stata 14 there is no str() function any more, to capture the character codes of the actual code page?
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#17

08 Jun 2015, 07:52

Klaudia, I don't understand why you think you need the actual code to address the OP's problem. If the accented character displays correctly in Stata, then all you need is something like:

Code:

. dis subinstr("éleàtr","é","e",.) eleàtr

If you don't know how to type the character on the keyboard, just use cut and paste.
Comment
Svend Juul

Join Date: Apr 2014

Posts: 515
#18

08 Jun 2015, 08:03

Eric, Klaudia,

I understand that you don't use Stata 14 (yet). As long as you can generate the characters to be replaced with your keyboard, don't worry; Klaudia's suggestion in post #2 can be simplified to

Code:

clear set obs 1 generate str8 test = "àáâãä" replace test = subinstr(test, "à", "a",.) replace test = subinstr(test, "á", "a",.) replace test = subinstr(test, "â", "a",.) replace test = subinstr(test, "ã", "a",.) replace test = subinstr(test, "ä", "a",.) list, clean

It is when you have characters that you cannot generate directly with your keyboard, that you need the codes. A handy command is asciiplot. It is not (yet) working right with Stata 14, but actually the Unicode code points for the extended ASCII area are tthe same as in the Latin 1 encoding; just use the uchar() rather than the char() function in Stata 14.
Comment
Klaudia Erhardt

Join Date: Mar 2015

Posts: 74
#19

08 Jun 2015, 08:05

@ Robert and Svend:

Yes, you are right. My approach stems from the experience to have to add up strings containing quotes and compound quotes for my user written ado-file combival. It worked well when I used the extenden ASCII-codes.

But this thread shows me that I have to pay attention to the issue: the results of the str() function might be platform-dependent, and combival might end up in an error with Stata 14.
Its a pity that I still cannot try it with Stata 14.

Last edited by Klaudia Erhardt; 08 Jun 2015, 08:11.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30095
#20

08 Jun 2015, 08:29

As an aside, for those using Stata versions 8.2 through 13.1, an easy way to see all the special characters and their ASCII codes is the -asciiplot- command, authored by Michael Blasnik, Svend Juul, and Nick Cox, and available on SSC. It produces a "graph" which actually is a table arraying the visible versions of all the ASCII characters, laid out by their high and low order digits. (It does not work in version 14 due to the conversion to Unicode.)
Comment
Klaudia Erhardt

Join Date: Mar 2015

Posts: 74
#21

09 Jun 2015, 08:54

@ #20, asciiplot: Oh, what a nice little gadget! A pity that progress overruns it!
Comment
Svend Juul

Join Date: Apr 2014

Posts: 515
#22

09 Jun 2015, 09:37

A revised version of asciiplot is on its way - it works both in Stata <14 and Stata 14+. And actually the Unicode code points are identical to the ASCII codes in the Latin 1 encoding, so it is still useful.

Svend
1 like
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment