Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem when importing excel to stata

    I got one problem! In my excel file, the first column is people's name in Chinese. When I tried to import this file to stata, stata shows ??? in the first column. However, if I just copy and paste from excel in Data Editor, I can see these Chinese names. My professor needs me to upload do file, so I want to know how fix this problem.

  • #2
    I am hoping your are running Stata 15. If so, I believe the solution to your problem lies in using the locale() option for import excel. And the output of the command
    Code:
    unicode locale list
    will provide a list of all possible locales, presumably some of which will be helpful in your work. Do see
    Code:
    help unicode locale
    for further discussion of this.

    Comment


    • #3
      Oops, Sorry, I am running Stata 14.

      Comment


      • #4
        unicode locale and the locale() option for import excel are also available in Stata 14.

        Comment


        • #5
          Sorry, I'm getting stuck. I tried to help unicode local and help unicode translate. I can't understand what does unicode local mean. But I think unicode translate is my desired code. I don't know how to fix this problem.
          To be specific, my file is okay. Just the first variable "name" is Chinese, thus stata shows ??? of all observations.

          Comment


          • #6
            There is something very strange happening. Someone else faced the same problem as me. His code is as follows:
            clear
            cd whereever
            unicode analyze filename.dta
            unicode encoding set gb18030
            unicode translate filename.dta

            I tried to copy the code, however, stata says: 0 file(s) to be examined ...
            (nothing to do)
            And this code doesn't work for me.

            Comment


            • #7
              Which menu command are you using to import your Excel file? Is it File → Import → Excel spreadsheet (.xls or .xlsx)? Or is it File → Import → Text data (delimited,.csv)?

              If it is an Excel workbook (.xls or .xlsx), then the Chinese names will already be stored in Unicode, and Stata doesn't need to do anything (in Windows) when importing using File → Import → Excel spreadsheet.

              But if it is a .csv file (whose file icon in Windows resembles that of an Excel workbook and which when double-clicked will be automatically opened by Excel), then you need to specify the "Text encoding" from "Latin 1" to something else. Hopefully, specifying "UTF-8" or "UTF-16" will work.

              Comment


              • #8
                unicode analyze sample.dta

                File summary (before starting):
                1 file(s) specified
                1 file(s) already known to be ASCII in previous runs
                0 file(s) to be examined ...
                (nothing to do)

                Comment


                • #9
                  Probably your file is not called filename.dta but I am guessing because it is not clear which commands you used. Please follow this advice from section 12 of the FAQ:

                  Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!
                  Please copy and paste the Stata commands and output from the Results window of Stata, do not retype them here. Commands and output are easier to read when you use CODE tags. This is also explained in the FAQ.

                  Comment


                  • #10
                    The code from my do-file:
                    import excel "/Users/zhaoyao/Documents/summary report/import this data to stata.xlsx", sheet("???1") firstrow
                    save sample,replace

                    Hi, Joseph! It's File → Import → Excel spreadsheet (.xls or .xlsx). And you are right, stata doesn't need to do anything. However, I just can't see Chinese names in Data Browser.

                    Comment


                    • #11
                      . unicode analyze sample.dta

                      File summary (before starting):
                      1 file(s) specified
                      1 file(s) already known to be ASCII in previous runs
                      0 file(s) to be examined ...
                      (nothing to do)

                      Comment


                      • #12
                        [added in edit: While I was writing this, posts 6-11 arrived which I did not see before posting this]

                        Based on Friedrich's post #4, I am going to describe what I suggest, based on my Stata 15 documentation, and you can see if indeed it works in Stata 14.

                        In Stata, I run the unicode locale list command and get the following output (with many lines removed)
                        Code:
                        . unicode locale list
                        
                           #      Locale                      Language                         Country
                        -------------------------------------------------------------------------------
                           1          af                     Afrikaans                                
                           2       af_NA                     Afrikaans                         Namibia
                           3       af_ZA                     Afrikaans                    South Africa
                           4         agq                         Aghem                                
                           5      agq_CM                         Aghem                        Cameroon
                            [lines removed]
                         673          zh                       Chinese                                
                         674     zh_Hans                       Chinese                                
                         675  zh_Hans_CN                       Chinese                           China
                         676  zh_Hans_HK                       Chinese             Hong Kong SAR China
                         677  zh_Hans_MO                       Chinese                 Macau SAR China
                         678  zh_Hans_SG                       Chinese                       Singapore
                         679     zh_Hant                       Chinese                                
                         680  zh_Hant_HK                       Chinese             Hong Kong SAR China
                         681  zh_Hant_MO                       Chinese                 Macau SAR China
                         682  zh_Hant_TW                       Chinese                          Taiwan
                            [lines removed]
                        -------------------------------------------------------------------------------
                        What you see in 673-682 are the possible locale specifications for the Chinese language ("zh"). There are two "scripts" ("Hans" and "Hant") and several country specifications. My believe is you want either zh_Hans or zh_Hant, depending on the script. (You should run the command in your Stata 14 implementation and see the locales that are shown in its output; perhaps some that I show are new in Stata 15.)

                        Then, you add to your import excel command the appropriate locale option, for example
                        Code:
                        import excel yourworkbook.xlsx, locale("zh_Hans")
                        as documented in
                        Code:
                        help import excel

                        Comment


                        • #13
                          Click image for larger version

Name:	WechatIMG3.jpeg
Views:	1
Size:	60.6 KB
ID:	1412146
                          my sample.dta. ??? represents Chinese names in .xlsx file

                          Comment


                          • #14
                            I understand William Lisowski's meaning now. I checked unicode locale list in Stata14.
                            674 zh_Hans Chinese
                            675 zh_Hans_CN Chinese China
                            These are exactly same as Stata 15.

                            However:
                            . import excel "/Users/zhaoyao/Documents/summary report/import this data to stata.xlsx", sheet("???1") firstrow locale("zh_Hans_CN")
                            locale zh_Hans_CN not found

                            And the problem still exists.All of my Chinese names are ?? or ???
                            Last edited by Yao Zhao; 26 Sep 2017, 19:30.

                            Comment


                            • #15
                              Code:
                              import excel "/Users/zhaoyao/Documents/summary report/import this data to stata.xlsx", sheet("???1") firstrow locale("zh_Hans_CN")
                              locale zh_Hans_CN not found

                              Comment

                              Working...
                              X