Unicode to ASCII translation - and more

This replaces a previous post about the non-functioning trans_unicode package.

Thanks to Kit Baum, the unicode2ascii package has been installed at SSC. It includes three commands that analyze or translate single files or groups of files in the current directory:

whichencoding examines the occurrence of Unicode and extended ASCII characters in Stata datasets and text files like do-files, ado-files, help files and log files. This is useful to determine the need for translation when sharing Stata files between users or computers with different versions of Stata installed. The official unicode analyze command serves the same purpose, but the output from whichencoding is more compact and transparent.

ascii2unicode translates datasets and text files with extended ASCII characters to Unicode encoding. Destination files take the names of the source files, and a suffix is added to the source file names. The official unicode translate command serves the same purpose, but the output from ascii2unicode is more compact and transparent, and you have access both to Unicode and ASCII versions of datasets and text files at the same time.

unicode2ascii translates datasets and text files with Unicode characters to ASCII encoding and saves datasets in Stata 13 or 12 format. Variable names, label names and contents (including labels in different languages), string variable contents, and notes are translated. The source files keep their names, and a suffix is added to the destination file names. Currently (September 2015), no official Stata command serves the same purpose.

Recently, Daniel Bela published two related commands at SSC: saveascii, which in Stata 14 translates the dataset in memory to ASCII encoding and saves it in Stata 13 or 12 format, and useold, which translates an ASCII encoded dataset to Unicode before opening it in Stata 14.

Svend Juul and Morten Frydenberg

Announcement

Unicode to ASCII translation - and more