Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unintelligible code in Stata 16

    Dear Stata users,
    I use community-contributed command -tabout- (SSC, written by Ian Watson) to export two-way tables to Excel (.xls). In Stata 13, it supports Chinese characters well and gives me desired results as shown in first panel in picture that I posted below. However, in Stata 16 which use Unicode to encode Chinese characters, the -tabout- command gives result that full of unintelligible code as shown in the second panel below. Here I use auto.dta to give an example, I define value labels in Chinese and execute the -tabout- command. Like softwares gonna to be invalid when OS updating, it seems that many old version of community-contributed commands (authors maybe have no plan to update them) support poorly for Stata15/16. Is there anyway to solve this problem?
    Code:
    sysuse auto
    label define rep78 1 "一个" 2 "两个" 3 "三个" 4 "四个" 5 "五个及以上", modify
    label values rep78 rep78
    label define foreign 0 "国产车" 1 "进口车", modify
    label values foreign foreign
    tabout rep78 foreign using auto.xls, cells(row) stats(chi2) h3(nil) append
    Click image for larger version

Name:	_20190723205054.jpg
Views:	1
Size:	57.0 KB
ID:	1509052


  • #2
    One possibility is that the do file that creates the labels is in extended ascii, and you need to translate the do file to unicode. See: help unicode_advice
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      many old versions of community-contributed commands (authors maybe have no plan to update them) support poorly for Stata15/16
      I see in this thread one command being mentioned -- tabout -- and it's not yet clear where the problem lies. So, that statement strikes me as exaggeration.

      Comment


      • #4
        If you tabout to txt, then import to excel it works well enough. I believe that's also the functionality that tabout promises.

        Comment


        • #5
          To expand a bit on Jorrit Gosens' answer: tabout exports as text, not xls. The xls file you save with your command has nothing to do with an Excel file and is really text with a bad extension (try open it with notepad).

          You should never, ever save a text file with an xls extension an pretend it's an Excel file.

          If you open it anyway (double click), Excel will print a warning, and if you really insist, it will try to import and to guess the encoding. Often, guessing a text encoding is doomed to fail. However, when you import the file with the correct extension, Excel gives the opportunity to chose the correct encoding, here UTF-8, and the file is correctly imported by Excel.

          Comment


          • #6
            Maarten Buis, thank you very much. I execute my commands in #1 in command window instead of dofile. I don't think the problem result from illegal encode in dofile.
            Nick Cox, thank you as always. I admit my statement that you quoted is a bit exaggerated. What I mean is that Stata supports Chinese characters not very well although it introduces Unicode from Stata 14. What's more, Stata still cannot do very well when trying to export summarize/tabulate/table/regress...... results into Word, Excel or alike documents. Options in Results Window like "Copy" "Copy Table" "Copy Table as HTML" "Copy as Picture" perform poorly when we copy and paste results into Word or Excel.

            Comment


            • #7
              Jorrit Gosens Jean-Claude Arbaut, thank you both. You are right that -tabout- is designed for .txt. I can export results into .txt and then copy and paste into excel. But for convenience, I can export results "directly" into .xls and it done well. Jean-Claude Arbaut is right in explanation of principles and problems of this little trick. In fact, my problem is NOT that whether we should -tabout- results into .txt or .xls, my problem is that -tabout- can support Chinese characters well in Stata 13, but it does poorly in Stata 16.

              Comment


              • #8
                Jean-Claude Arbaut is right.
                ​​​​
                You simply should NOT save it as an xls file and expect Excel to open it correctly since xls is a binary formatted file and it does not support UTF-8 encoding (which is the default Stata dataset encoding since Stata 14) unless being instructed to do so. What should be done is to save the file from -tabout- as auto.txt, then use Excel to open it. Excel will then recognize it is a text file in UTF-8 encoding and perform an import operation.

                Comment


                • #9
                  Dear Hua Peng (StataCorp), it seems not that simple. I do run -tabout- and save result as auto.txt after listening advices above. The problem is Stata 13 gave me correct results (encoding) and Stata 16 unfortunately failed. See picture below. My OS is Window 7, 64-bit.

                  Click image for larger version

Name:	_20190724122208.jpg
Views:	1
Size:	71.3 KB
ID:	1509182

                  Comment


                  • #10
                    Dear Hua Peng (StataCorp). I think I find cause of the problem. I first run command script in Stata 13 and then run the same script in Stata 16. I save their results as one .txt file using the option -append-. Encoding (first panel for Stata 13 results and second panel for Stata 16 results) in my .txt file must be messed. If I run that script single in Stata 16, the problem disappear, and yes, Stata 16 do works well. Thanks for your attention and explanation!

                    Comment


                    • #11
                      To confirm, the issue Chen raises is not a Stata problem (Stata handles the Unicode fine in the example), or a -tabout- problem, but an Excel problem--that Excel is not automatically reading the table file in UTF-8 format. As Jean-Claude Arbaut and Hua Peng (StataCorp) point out, the practice of outputting a text format file with the ".xls" extension to open it in Excel is really nonstandard. However, it can be done successfully.

                      The key for Excel to interpret text input as Unicode (without going through the manual import process within the app) is that the file needs to start with a Byte Order Mark at the beginning, as discussed at https://superuser.com/a/1437785 and https://en.wikipedia.org/wiki/Byte_order_mark. The UTF-8 BOM consists of the three hex characters \0xef\0xbb\0xbf. So if we make a little binary file containing only the byte order mark characters, -tabout- can append this file to the beginning of the table file, and then the resulting table will open in Excel with Unicode preserved (as desired).

                      Code:
                      *step 1: create the BOM file
                      file open f using bom, write replace binary
                      file write f %1bu (239) %1bu (187) %1bu (191)
                      // corresponds to: 0XEF 0XBB 0XBF
                      file close f
                      hexdump bom //confirms hex codes are correct
                      
                      *step 2: run tabout on Chen's example, appending the BOM file at the start of the table file
                      sysuse auto, clear
                      label define rep78 1 "一个" 2 "两个" 3 "三个" 4 "四个" 5 "五个及以上", modify
                      label values rep78 rep78
                      label define foreign 0 "国产车" 1 "进口车", modify
                      label values foreign foreign
                      tabout rep78 foreign using auto.csv, cells(row) stats(chi2) h3(nil) replace style(csv) topf(bom)
                      The resulting auto.csv file opens for me in Excel (version 16 for Mac) with all Unicode preserved.

                      Note that this process does not seem to work right when the table is left as tab-separated (the tabout default), so I made it comma-separated instead (and changed the extension from .xls to .csv--it works with .xls also, but Excel gives a warning message, as Jean-Claude mentioned). I originally wrote "bom" to be a tempfile, but if you leave it as a little binary file on disk, you can do the substitution in any future tables without rerunning the file command.

                      Comment


                      • #12
                        Thank you Kye Lippold. Great and amazing! Debuging (this word must bug Nick Cox) Stata is full of funs.

                        Comment


                        • #13
                          Also, you can give MS Excel UTF-16:
                          Code:
                          unicode convertfile auto.xls auto_utf16.xls, replace srcencoding(UTF-8) dstencoding(UTF-16)
                          Last edited by Bjarte Aagnes; 24 Jul 2019, 10:51.

                          Comment


                          • #14
                            Originally posted by Bjarte Aagnes View Post
                            Also, you can give MS Excel UTF-16:
                            Code:
                            unicode convertfile auto.xls auto_utf16.xls, replace srcencoding(UTF-8) dstencoding(UTF-16)
                            I think this solution is better for me . PS :stata15.0 & Office 2016

                            Comment


                            • #15
                              Thank you very much Wang, I have just forgot that I posted such a question here.

                              Comment

                              Working...
                              X