Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing ODBC data with the correct encoding

    Dear Statalist

    In Stata 13 I could import all string variables correctly (i.e. Danish special characters) with the one-line code:
    odbc load, table("dbo.V_Kim_Statistik") clear lowercase noquote dsn(PEBLData)

    In Stata 14 I found this solution to deal with the unicode awareness of Stata 14:
    odbc load, table("dbo.V_Kim_Statistik") clear lowercase noquote dsn(PEBLData)
    save "my_file.dta"
    clear
    unicode encoding set "latin1"
    unicode translate "my_file.dta"
    use "my_file.dta"
    However, this is tedious and time consuming.

    Is it possible to import my data with the correct encoding using a one line code is Stata 14?
    (That is, without saving, clearing, translating and using the data afterwards).

    Rgards Kim

    My SQL info:
    Microsoft SQL Server Management Studio 10.50.2500.0
    Microsoft Analysis Services Client Tools 10.50.2500.0
    Microsoft Data Access Components (MDAC) 6.1.7601.17514
    Microsoft MSXML 3.0 4.0 6.0
    Microsoft Internet Explorer 8.0.7601.17514
    Microsoft .NET Framework 2.0.50727.5485
    Operating System 6.1.7601

  • #2
    I would guess that this can be done with:

    Code:
    set odbcdriver unicode
    However, I cannot test this conjecture since I don't have this type of DB.

    Comment


    • #3
      Stata expect UTF-8. Specify UTF-8 encoding in your DSN specification if possible,

      or try

      Code:
      set odbcdriver ansi
      http://www.stata.com/manuals14/dodbc.pdf

      http://www.statalist.org/forums/foru...h-gui-and-odbc

      Regards
      Bjarte Aagnes

      Comment


      • #4
        You can wrap your multi-line code in a short Stata program and call the program in one line, having Stata do all of the repetitive, tedious stuff by itself behind the scenes and delivering you the converted dataset at program exit.

        You can also use the SQL function CAST(. . ., AS NVARCHAR(. .. .)) in a odbc load, exec() Stata command. See here for details.

        Comment


        • #5
          Thank you for your suggestions

          Monday morning back at work I tried:
          Code:
          set odbcdriver ansi, permanently
          exit, clear
          After restarting Stata, the codes above had had no effect, that is, the special Danish letters in string variables were still not read properly.

          I also tried to use the Unicode option:
          Code:
          set odbcdriver unicode, permanently
          exit, clear
          Still, the special Danish letters in string variables are not read properly.

          I will try to figure out how to use the suggestion proposed by Joseph Coveney, issuing a SQL statements through an exec() command. Doing that it seems to me I will have to write a very long SQL code converting each and every string variable in my dataset?

          I hope StataCorp can help me find a simple seamless solution to make Stata 14 behave exactly likeStata 13 when doing cdbc import.

          Comment


          • #6
            Until StataCorp hopefully find a seamless solution, as a remedy I have included the folloving codes in my do-file (codes sugested by Alan Riley in his post http://www.statalist.org/forums/forum/general-stata-discussion/general/1290766-stata-14-unicode-and-extended-ascii:

            Code:
            foreach var of varlist _all {
               capture confirm string variable `var'
               if _rc==0 {
                   replace `var' = ustrfrom(`var', "latin1", 1) if !_strisbinary(`var')
               }
               local thelab : variable label `var'
               local thelab = ustrfrom(`"`thelab'"', "latin1", 1)
               label variable `var' `"`thelab'"'
               local newname = ustrfrom("`var'", "latin1", 1)
               rename `var' `newname'
            }
            Regards Kim

            Comment


            • #7
              Just FYI, you can also use the unicode commands

              Code:
              cap mkdir "$path_data/UnicodeTranslationFolder"                                // The translation is done in an different folder to ensure that only the
              cd "$path_data/UnicodeTranslationFolder"                                    // backup for the current file is deleted before the new one is created.
              save "name.dta", replace    
              clear                                                                        // unicode command needs a clear memory
              cap unicode erasebackups, badidea                                            // Erase previous backups
              unicode encoding set ISO-8859-1                                                // Set encoding language
              unicode translate "name.dta"    
              use "name.dta", clear

              Comment

              Working...
              X