Problem when importing excel to stata

Yao Zhao

Join Date: Feb 2017

Posts: 226
#1

Problem when importing excel to stata

24 Sep 2017, 20:17

I got one problem! In my excel file, the first column is people's name in Chinese. When I tried to import this file to stata, stata shows ??? in the first column. However, if I just copy and paste from excel in Data Editor, I can see these Chinese names. My professor needs me to upload do file, so I want to know how fix this problem.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

25 Sep 2017, 09:31

I am hoping your are running Stata 15. If so, I believe the solution to your problem lies in using the locale() option for import excel. And the output of the command

Code:

unicode locale list

will provide a list of all possible locales, presumably some of which will be helpful in your work. Do see

Code:

help unicode locale

for further discussion of this.
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#3

26 Sep 2017, 11:23

Oops, Sorry, I am running Stata 14.
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#4

26 Sep 2017, 13:44

unicode locale and the locale() option for import excel are also available in Stata 14.
1 like
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#5

26 Sep 2017, 17:54

Sorry, I'm getting stuck. I tried to help unicode local and help unicode translate. I can't understand what does unicode local mean. But I think unicode translate is my desired code. I don't know how to fix this problem.
To be specific, my file is okay. Just the first variable "name" is Chinese, thus stata shows ??? of all observations.
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#6

26 Sep 2017, 18:18

There is something very strange happening. Someone else faced the same problem as me. His code is as follows:
clear
cd whereever
unicode analyze filename.dta
unicode encoding set gb18030
unicode translate filename.dta

I tried to copy the code, however, stata says: 0 file(s) to be examined ...
(nothing to do)
And this code doesn't work for me.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#7

26 Sep 2017, 18:29

Which menu command are you using to import your Excel file? Is it File → Import → Excel spreadsheet (.xls or .xlsx)? Or is it File → Import → Text data (delimited,.csv)?

If it is an Excel workbook (.xls or .xlsx), then the Chinese names will already be stored in Unicode, and Stata doesn't need to do anything (in Windows) when importing using File → Import → Excel spreadsheet.

But if it is a .csv file (whose file icon in Windows resembles that of an Excel workbook and which when double-clicked will be automatically opened by Excel), then you need to specify the "Text encoding" from "Latin 1" to something else. Hopefully, specifying "UTF-8" or "UTF-16" will work.
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#8

26 Sep 2017, 18:31

unicode analyze sample.dta

File summary (before starting):
1 file(s) specified
1 file(s) already known to be ASCII in previous runs
0 file(s) to be examined ...
(nothing to do)
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#9

26 Sep 2017, 18:33

Probably your file is not called filename.dta but I am guessing because it is not clear which commands you used. Please follow this advice from section 12 of the FAQ:

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

Please copy and paste the Stata commands and output from the Results window of Stata, do not retype them here. Commands and output are easier to read when you use CODE tags. This is also explained in the FAQ.
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#10

26 Sep 2017, 18:34

The code from my do-file:
import excel "/Users/zhaoyao/Documents/summary report/import this data to stata.xlsx", sheet("???1") firstrow
save sample,replace

Hi, Joseph! It's File → Import → Excel spreadsheet (.xls or .xlsx). And you are right, stata doesn't need to do anything. However, I just can't see Chinese names in Data Browser.
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#11

26 Sep 2017, 18:38

. unicode analyze sample.dta

File summary (before starting):
1 file(s) specified
1 file(s) already known to be ASCII in previous runs
0 file(s) to be examined ...
(nothing to do)
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

#12

26 Sep 2017, 18:40

[added in edit: While I was writing this, posts 6-11 arrived which I did not see before posting this]

Based on Friedrich's post #4, I am going to describe what I suggest, based on my Stata 15 documentation, and you can see if indeed it works in Stata 14.

In Stata, I run the unicode locale list command and get the following output (with many lines removed)

Code:

. unicode locale list

   #      Locale                      Language                         Country
-------------------------------------------------------------------------------
   1          af                     Afrikaans                                
   2       af_NA                     Afrikaans                         Namibia
   3       af_ZA                     Afrikaans                    South Africa
   4         agq                         Aghem                                
   5      agq_CM                         Aghem                        Cameroon
    [lines removed]
 673          zh                       Chinese                                
 674     zh_Hans                       Chinese                                
 675  zh_Hans_CN                       Chinese                           China
 676  zh_Hans_HK                       Chinese             Hong Kong SAR China
 677  zh_Hans_MO                       Chinese                 Macau SAR China
 678  zh_Hans_SG                       Chinese                       Singapore
 679     zh_Hant                       Chinese                                
 680  zh_Hant_HK                       Chinese             Hong Kong SAR China
 681  zh_Hant_MO                       Chinese                 Macau SAR China
 682  zh_Hant_TW                       Chinese                          Taiwan
    [lines removed]
-------------------------------------------------------------------------------

What you see in 673-682 are the possible locale specifications for the Chinese language ("zh"). There are two "scripts" ("Hans" and "Hant") and several country specifications. My believe is you want either zh_Hans or zh_Hant, depending on the script. (You should run the command in your Stata 14 implementation and see the locales that are shown in its output; perhaps some that I show are new in Stata 15.)

Then, you add to your import excel command the appropriate locale option, for example

Code:

import excel yourworkbook.xlsx, locale("zh_Hans")

as documented in

Code:

help import excel

Comment

Yao Zhao

Join Date: Feb 2017

Posts: 226
#13

26 Sep 2017, 18:41

my sample.dta. ??? represents Chinese names in .xlsx file
Comment
Yao Zhao

Join Date: Feb 2017

Posts: 226
#14

26 Sep 2017, 18:54

I understand William Lisowski's meaning now. I checked unicode locale list in Stata14.
674 zh_Hans Chinese
675 zh_Hans_CN Chinese China
These are exactly same as Stata 15.

However:
. import excel "/Users/zhaoyao/Documents/summary report/import this data to stata.xlsx", sheet("???1") firstrow locale("zh_Hans_CN")
locale zh_Hans_CN not found

And the problem still exists.All of my Chinese names are ?? or ???

Last edited by Yao Zhao; 26 Sep 2017, 19:30.
Comment

Yao Zhao

Join Date: Feb 2017
Posts: 226

#15

26 Sep 2017, 19:37

Code:

import excel "/Users/zhaoyao/Documents/summary report/import this data to stata.xlsx", sheet("???1") firstrow locale("zh_Hans_CN")

locale zh_Hans_CN not found

Announcement