My original dataset contains a chinese character string variable where some “exotic character” exist, which means you can’t eliminate some blank spaces around the string. Following http://www.stata.com/statalist/archi.../msg00891.html in statalist, I have managed to identify and remove those unobservable exotic characters (though not quite understand the underlying mechanism).
and
However, when I want to convert this dataset(of Stata13 format)to Stata 14 format using Unicode command, the string variable is replaced by those little square like following. Even I keep the original variable without removing the exotic characters, it still end up with the same result. So that I'm not 100% sure whether it's due to encoding problem or the exotic characters. Dataset sample see the attactment (in stata13 and below format)
Thankyou
The unicode translate is preformed like
HTML Code:
. charlist city &'().01?ABCDEGHIJKLMNPQSTUWXYZabcdeghijklnopqrstuwxyz�������������������������� > ��������������������������������������������������������������������� . ret li macros: r(chars) : " &'().01?ABCDEGHIJKLMNPQSTUWXYZabcdeghijklnopqrs.." r(sepchars) : " & ' ( ) . 0 1 ? A B C D E G H I J K L M N P .." r(ascii) : "10 13 32 38 39 40 41 46 48 49 63 65 66 67 68 69 71.."
HTML Code:
replace city = subinstr(city, "`=char(10)'", "",.) replace city = subinstr(city, "`=char(32)'", "",.) replace city = subinstr(city, "`=char(161)'`=char(161)'", "",.)
Thankyou
The unicode translate is preformed like
HTML Code:
cd E:\Land_Supply\Data\土地交易微观数据 clear *unicode encoding set gb18030 // city names are in chinese unicode analyze trans_citypanel2013.dta unicode translate trans_citypanel2013.dta,invalid u trans_citypanel2013,clear
Comment