Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to export strings with accented characters to CSV? (Stata 14)

    I use Stata 14 and am trying to create a CSV file containing strings with accented characters that can be opened easily by users of Excel. Take the data below.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str53 NOTE
    "Encuesta de Caracterización Socioeconómica Nacional"
    "Encuesta de Hogares de Propósitos Múltiples"        
    end
    
    export delimited using "test.csv", delim(",") replace
    If this CSV file is opened in Excel from File - Open or by double-clicking on the file, the accented characters are not displayed correctly but look like this:
    Code:
    Encuesta de Caracterización Socioeconómica Nacional
    Encuesta de Hogares de Propósitos Múltiples
    To display the characters correctly in Excel, it is necessary to import the data as a delimited text file and to specify UTF-8 encoding. I tried converting the CSV file with the commands below.
    Code:
    clear
    unicode analyze "test.csv"
    unicode encoding set "latin1"
    unicode translate "test.csv"
    Stata tells me that the file is already in UTF8 format and does nothing.
    Code:
    . clear
    
    . unicode analyze "test.csv"
      (Directory ./bak.stunicode created; please do not delete)
    
      File summary (before starting):
            1  file(s) specified
            1  file(s) to be examined ...
    
      File test.csv (text file)
                     3 lines in file
                     1 lines ASCII
                     2 lines UTF-8
              File does not need translation, except ...
              The file appears to be UTF-8 already.  Sometimes files that still need translating
              can look like UTF-8.  See lines 2 and 3.  A total of 2 lines out of 3 appear to be
              UTF8.
              --------------------------------------------------------------------------------------
    
      File summary:
          all files okay
    
    . unicode encoding set "latin1"
      (default encoding now latin1)
    
    . unicode translate "test.csv"
      (using latin1 encoding)
    
      File summary (before starting):
            1  file(s) specified
            1  file(s) already known to be UTF8  in previous runs
            0  file(s) to be examined ...
      (nothing to do)
    What am I doing wrong and how can I create a CSV file that can be opened in Excel without going through the file import dialog?

  • #2
    Since your goal is Excel, why not use the "export excel" command?

    Comment


    • #3
      The file is needed for import into an SQL Server database and should be in CSV format but some people will look at the file in Excel.

      Comment


      • #4
        Interesting that SQL Server can't import Excel files. But your question makes more sense now. Unfortunately, I have Stata 13 at home, which doesn't support unicode, so I don't have all options available tonight. Depending on demands at work tomorrow I may have a crack at at your problem with Stata 14.

        Comment


        • #5
          Originally posted by Friedrich Huebler View Post
          I use Stata 14 and am trying to create a CSV file containing strings with accented characters that can be opened easily by users of Excel. Take the data below.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str53 NOTE
          "Encuesta de Caracterización Socioeconómica Nacional"
          "Encuesta de Hogares de Propósitos Múltiples"
          end
          
          export delimited using "test.csv", delim(",") replace
          If this CSV file is opened in Excel from File - Open or by double-clicking on the file, the accented characters are not displayed correctly but look like this:
          Code:
          Encuesta de Caracterización Socioeconómica Nacional
          Encuesta de Hogares de Propósitos Múltiples
          To display the characters correctly in Excel, it is necessary to import the data as a delimited text file and to specify UTF-8 encoding.

          ...

          What am I doing wrong and how can I create a CSV file that can be opened in Excel without going through the file import dialog?
          If Excel wants an extended ASCII CSV file with latin1 encoding, you can convert the UTF-8 CSV file with Stata's unicode convertfile command:

          Code:
          unicode convertfile testorig.csv testconverted.csv, dstencoding(latin1)

          Comment


          • #6
            In addition to what Alan suggested, to open a UTF-8 encoded CSV file in Excel, you need "import" it instead of "open" it in Excel.

            For example, using Excel 2010,.
            1. Click on the Data menu bar option.
            2. Click on the From Text icon.
            3. Navigate to the location of the file that you want to import. Click on the filename and then click on the Import button.
            4. Choose the file type that best describes your data - Delimited or Fixed Width.
            5. Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin
            Excel 2007 is similar to 2010, you may need google the particular step and/or GUI items to use for your version of Excel.

            Comment


            • #7
              Thank you, Alan and Hua. I tried Alan's code but it can only convert the CSV file up to a certain point and then stops. I identified the problematic string and added it to the example dataset.
              Code:
              clear
              input str62 NOTE
              "Encuesta de Caracterización Socioeconómica Nacional"         
              "Encuesta de Hogares de Propósitos Múltiples"                 
              "Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza"
              end
              
              export delimited using "test.csv", delim(",") replace
              When I run unicode convertfile I get an error message.
              Code:
              . unicode convertfile "test.csv" "test2.csv", dstencoding(latin1) replace
              Unicode character invalid for the target encoding found
                  Invalid character starts at byte position 137.  Invalid character as bytes are 2013
              file "test.csv" partially converted to file "test2.csv"
              r(198);
              It seems that the dash is the problematic character because the file is only converted up the word "Hogares" in the third observation. The file "test2.csv" contains these lines:
              Code:
              NOTE
              Encuesta de Caracterización Socioeconómica Nacional
              Encuesta de Hogares de Propósitos Múltiples
              Encuesta Nacional de Hogares

              Comment


              • #8
                The problem also exists in Stata 15. Please post any follow-up messages in this thread.

                Comment

                Working...
                X