Importing ODBC data with the correct encoding

Kim Lyngby Mikkelsen

Join Date: May 2014

Posts: 13
#1

Importing ODBC data with the correct encoding

02 Jul 2015, 06:32

Dear Statalist

In Stata 13 I could import all string variables correctly (i.e. Danish special characters) with the one-line code:
odbc load, table("dbo.V_Kim_Statistik") clear lowercase noquote dsn(PEBLData)

In Stata 14 I found this solution to deal with the unicode awareness of Stata 14:
odbc load, table("dbo.V_Kim_Statistik") clear lowercase noquote dsn(PEBLData)
save "my_file.dta"
clear
unicode encoding set "latin1"
unicode translate "my_file.dta"
use "my_file.dta"
However, this is tedious and time consuming.

Is it possible to import my data with the correct encoding using a one line code is Stata 14?
(That is, without saving, clearing, translating and using the data afterwards).

Rgards Kim

My SQL info:
Microsoft SQL Server Management Studio 10.50.2500.0
Microsoft Analysis Services Client Tools 10.50.2500.0
Microsoft Data Access Components (MDAC) 6.1.7601.17514
Microsoft MSXML 3.0 4.0 6.0
Microsoft Internet Explorer 8.0.7601.17514
Microsoft .NET Framework 2.0.50727.5485
Operating System 6.1.7601
Tags: None
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#2

02 Jul 2015, 11:33

I would guess that this can be done with:

Code:

set odbcdriver unicode

However, I cannot test this conjecture since I don't have this type of DB.
Comment
Bjarte Aagnes

Join Date: Apr 2014

Posts: 783
#3

02 Jul 2015, 12:25

Stata expect UTF-8. Specify UTF-8 encoding in your DSN specification if possible,

or try

Code:

set odbcdriver ansi

http://www.stata.com/manuals14/dodbc.pdf

http://www.statalist.org/forums/foru...h-gui-and-odbc

Regards
Bjarte Aagnes
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4410
#4

02 Jul 2015, 17:43

You can wrap your multi-line code in a short Stata program and call the program in one line, having Stata do all of the repetitive, tedious stuff by itself behind the scenes and delivering you the converted dataset at program exit.

You can also use the SQL function CAST(. . ., AS NVARCHAR(. .. .)) in a odbc load, exec() Stata command. See here for details.
Comment
Kim Lyngby Mikkelsen

Join Date: May 2014

Posts: 13
#5

06 Jul 2015, 01:09

Thank you for your suggestions

Monday morning back at work I tried:

Code:

set odbcdriver ansi, permanently exit, clear

After restarting Stata, the codes above had had no effect, that is, the special Danish letters in string variables were still not read properly.

I also tried to use the Unicode option:

Code:

set odbcdriver unicode, permanently exit, clear

Still, the special Danish letters in string variables are not read properly.

I will try to figure out how to use the suggestion proposed by Joseph Coveney, issuing a SQL statements through an exec() command. Doing that it seems to me I will have to write a very long SQL code converting each and every string variable in my dataset?

I hope StataCorp can help me find a simple seamless solution to make Stata 14 behave exactly likeStata 13 when doing cdbc import.
Comment

Kim Lyngby Mikkelsen

Join Date: May 2014
Posts: 13

06 Jul 2015, 02:28

Until StataCorp hopefully find a seamless solution, as a remedy I have included the folloving codes in my do-file (codes sugested by Alan Riley in his post http://www.statalist.org/forums/forum/general-stata-discussion/general/1290766-stata-14-unicode-and-extended-ascii:

Code:

foreach var of varlist _all {
   capture confirm string variable `var'
   if _rc==0 {
       replace `var' = ustrfrom(`var', "latin1", 1) if !_strisbinary(`var')
   }
   local thelab : variable label `var'
   local thelab = ustrfrom(`"`thelab'"', "latin1", 1)
   label variable `var' `"`thelab'"'
   local newname = ustrfrom("`var'", "latin1", 1)
   rename `var' `newname'
}

Regards Kim

Comment

Jesse Wursten

Join Date: Jan 2016
Posts: 915

31 Mar 2017, 09:30

Just FYI, you can also use the unicode commands

Code:

cap mkdir "$path_data/UnicodeTranslationFolder"                                // The translation is done in an different folder to ensure that only the
cd "$path_data/UnicodeTranslationFolder"                                    // backup for the current file is deleted before the new one is created.
save "name.dta", replace    
clear                                                                        // unicode command needs a clear memory
cap unicode erasebackups, badidea                                            // Erase previous backups
unicode encoding set ISO-8859-1                                                // Set encoding language
unicode translate "name.dta"    
use "name.dta", clear

Announcement

Importing ODBC data with the correct encoding

Comment

Comment

Comment

Comment

Comment

Comment