How to export strings with accented characters to CSV? (Stata 14)

Friedrich Huebler

Join Date: Apr 2014
Posts: 1053

How to export strings with accented characters to CSV? (Stata 14)

22 Jun 2017, 18:51

I use Stata 14 and am trying to create a CSV file containing strings with accented characters that can be opened easily by users of Excel. Take the data below.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str53 NOTE
"Encuesta de Caracterización Socioeconómica Nacional"
"Encuesta de Hogares de Propósitos Múltiples"        
end

export delimited using "test.csv", delim(",") replace

If this CSV file is opened in Excel from File - Open or by double-clicking on the file, the accented characters are not displayed correctly but look like this:

Code:

Encuesta de CaracterizaciÃ³n SocioeconÃ³mica Nacional
Encuesta de Hogares de PropÃ³sitos MÃºltiples

To display the characters correctly in Excel, it is necessary to import the data as a delimited text file and to specify UTF-8 encoding. I tried converting the CSV file with the commands below.

Code:

clear
unicode analyze "test.csv"
unicode encoding set "latin1"
unicode translate "test.csv"

Stata tells me that the file is already in UTF8 format and does nothing.

Code:

. clear

. unicode analyze "test.csv"
  (Directory ./bak.stunicode created; please do not delete)

  File summary (before starting):
        1  file(s) specified
        1  file(s) to be examined ...

  File test.csv (text file)
                 3 lines in file
                 1 lines ASCII
                 2 lines UTF-8
          File does not need translation, except ...
          The file appears to be UTF-8 already.  Sometimes files that still need translating
          can look like UTF-8.  See lines 2 and 3.  A total of 2 lines out of 3 appear to be
          UTF8.
          --------------------------------------------------------------------------------------

  File summary:
      all files okay

. unicode encoding set "latin1"
  (default encoding now latin1)

. unicode translate "test.csv"
  (using latin1 encoding)

  File summary (before starting):
        1  file(s) specified
        1  file(s) already known to be UTF8  in previous runs
        0  file(s) to be examined ...
  (nothing to do)

What am I doing wrong and how can I create a CSV file that can be opened in Excel without going through the file import dialog?

Tags: None

ben earnhart

Join Date: May 2014

Posts: 1027
#2

22 Jun 2017, 19:07

Since your goal is Excel, why not use the "export excel" command?
Comment
Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#3

22 Jun 2017, 19:29

The file is needed for import into an SQL Server database and should be in CSV format but some people will look at the file in Excel.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#4

22 Jun 2017, 20:19

Interesting that SQL Server can't import Excel files. But your question makes more sense now. Unfortunately, I have Stata 13 at home, which doesn't support unicode, so I don't have all options available tonight. Depending on demands at work tomorrow I may have a crack at at your problem with Stata 14.
Comment
Alan Riley (StataCorp)

StataCorp Employee

Join Date: Mar 2014

Posts: 170
#5

23 Jun 2017, 02:29

Originally posted by Friedrich Huebler View Post

I use Stata 14 and am trying to create a CSV file containing strings with accented characters that can be opened easily by users of Excel. Take the data below.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str53 NOTE "Encuesta de Caracterización Socioeconómica Nacional" "Encuesta de Hogares de Propósitos Múltiples" end export delimited using "test.csv", delim(",") replace

If this CSV file is opened in Excel from File - Open or by double-clicking on the file, the accented characters are not displayed correctly but look like this:

Code:

Encuesta de CaracterizaciÃ³n SocioeconÃ³mica Nacional Encuesta de Hogares de PropÃ³sitos MÃºltiples

To display the characters correctly in Excel, it is necessary to import the data as a delimited text file and to specify UTF-8 encoding.

...

What am I doing wrong and how can I create a CSV file that can be opened in Excel without going through the file import dialog?

If Excel wants an extended ASCII CSV file with latin1 encoding, you can convert the UTF-8 CSV file with Stata's unicode convertfile command:

Code:

unicode convertfile testorig.csv testconverted.csv, dstencoding(latin1)
1 like
Comment
Hua Peng (StataCorp)

StataCorp Employee

Join Date: Jun 2014

Posts: 367
#6

23 Jun 2017, 05:16

In addition to what Alan suggested, to open a UTF-8 encoded CSV file in Excel, you need "import" it instead of "open" it in Excel.

For example, using Excel 2010,.
Click on the Data menu bar option.

Click on the From Text icon.

Navigate to the location of the file that you want to import. Click on the filename and then click on the Import button.

Choose the file type that best describes your data - Delimited or Fixed Width.

Choose 65001: Unicode (UTF-8) from the drop-down list that appears next to File origin

Excel 2007 is similar to 2010, you may need google the particular step and/or GUI items to use for your version of Excel.
1 like
Comment

Friedrich Huebler

Join Date: Apr 2014
Posts: 1053

23 Jun 2017, 07:22

Thank you, Alan and Hua. I tried Alan's code but it can only convert the CSV file up to a certain point and then stops. I identified the problematic string and added it to the example dataset.

Code:

clear
input str62 NOTE
"Encuesta de Caracterización Socioeconómica Nacional"         
"Encuesta de Hogares de Propósitos Múltiples"                 
"Encuesta Nacional de Hogares – Condiciones de Vida y Pobreza"
end

export delimited using "test.csv", delim(",") replace

When I run unicode convertfile I get an error message.

Code:

. unicode convertfile "test.csv" "test2.csv", dstencoding(latin1) replace
Unicode character invalid for the target encoding found
    Invalid character starts at byte position 137.  Invalid character as bytes are 2013
file "test.csv" partially converted to file "test2.csv"
r(198);

It seems that the dash is the problematic character because the file is only converted up the word "Hogares" in the third observation. The file "test2.csv" contains these lines:

Code:

NOTE
Encuesta de Caracterización Socioeconómica Nacional
Encuesta de Hogares de Propósitos Múltiples
Encuesta Nacional de Hogares

Comment

Friedrich Huebler

Join Date: Apr 2014

Posts: 1053
#8

28 Jun 2017, 11:20

The problem also exists in Stata 15. Please post any follow-up messages in this thread.
Comment

Announcement

How to export strings with accented characters to CSV? (Stata 14)

Comment

Comment

Comment

Comment

Comment

Comment

Comment