Proper display of foreign language (traditional Chinese) using Stata 14

Man Yang

Join Date: Mar 2016

Posts: 183
#1

Proper display of foreign language (traditional Chinese) using Stata 14

12 Oct 2017, 12:29

Hi folks, I am working on a dataset that contains traditional Chinese characters in it. Below is my syntax in order to let Stata properly show the data:

Code:

unicode encoding set "GB18030"

Code:

unicode translate data.dta, invalid(mark) transutf8

However, the data still contains unrecognizable Chinese characters, which I don't know why. It seems the people were able to get Chinese properly displayed after using the syntax but apparently, mine is not the case here. Any clues for me please? Thanks.
Tags: None
Jason ZY Liu

Join Date: Oct 2018

Posts: 4
#2

08 Oct 2018, 14:41

clear
unicode analyze ##.dta
unicode encoding set gb18030
unicode retranslate ##.dta, invalid(mark) transutf8
use ##,clear
Comment
Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 346
#3

08 Oct 2018, 19:23

GB18030 is encoding for simplified Chinese. Try "windows-950-2000" for traditional Chinese. Since the dataset was already translated with the wrong encoding, you must restore the dataset to its original form first.

Code:

clear unicode restore data.dta

Then you may

Code:

clear unicode encoding set windows-950-2000 unicode translate data.dta

Carefully read -help unicode_translate- will help.
Comment

Announcement