Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Do file encoding/REDCap do files

    Dear all,

    REDCap is a database environment gaining popularity in medical research. It also has the handy functionality that it returns a do file for use with the csv containing the data when you create an export.

    I am currently working on a project where labels defined in this do file are in French. Unfortunately stata doesn't seem able to interpret characters such as é, despite the characters appearing to be fine when the do file is viewed in notepad or notepad++ etc. in Stata é appears as é.

    It appears to me that stata is trying to read the data using ASCII or something rather than the UTF-8 in which it is encoded. (Incidentally, replacing the characters in stata with the appropriate French character works, but there are a lot of different characters) I imagine that the same problem would arise with German and many other character sets...

    I was wondering whether anyone has had any experience with such encoding issues.


    Many thanks in advance

    Alan Haynes




  • #2
    Originally posted by Alan Haynes View Post
    ...It appears to me that stata is trying to read the data using ASCII or something rather than the UTF-8 in which it is encoded....
    Yes, Alan. You have just discovered that Stata doesn't work with unicode (utf-8 is a particular kind of unicode). You are lucky your output file is in French, for which a corresponding code page exists in ANSI and allows some processing. However once you send your do-files to your colleague in Greece, Israel, or Ukraine, they will see totally different content then you, because their code pages might be set up for their local scripts.

    Next time and the user group wishes and grumbles session, raise your hand for the unicode support in Stata.
    It is essentially the same advice as given here: https://www.stattransfer.com/faq/encoding.html

    One of the practical solutions would be using the tab2dta converter that I wrote a while ago:
    http://radyakin.org/transfer/tab2dta/tab2dta.htm



    Note that it takes BOTH the tab-delimited data file and the Stata do-file as inputs. It defaults to codepage 1252, which should be sufficient for your French database. However it has it's own expectations regarding the formatting of the do-file and does not support long strings. But it was developed for processing the output of the Survey Solutions, essentially for the same task you are doing, but a different data producer.
    Obviously it requires a tab-delimited file, not a csv-file, but it's not difficult to change that.
    The page linked above contains the description, program download and examples. Stata is not required.

    Since you do have Stata, perhaps just get a text editor with a codepage support and configure it to save the file in Western European codepage (Windows 1252). Then do the same for Stata.
    See last slide here:
    http://www.stata.com/meeting/uk13/ab...3_radyakin.pdf

    You could also write to the author of REDCap and give him/her/them a hint that what they produce can't be run properly in Stata. Then let them deal with it.

    Best, Sergiy Radyakin

    Comment

    Working...
    X