Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unicode characters in a dataset needs to be shared with a user of Stata 13 or earlier

    Hi

    The unicode feature in stata 14 is very nice feature!

    However, for those of us that need to share a data file with a user of Stata 13 or earlier, we need a simple way of doing so!!

    If one tries to use the command "saveold" Stata 14 warns us that, for example, "variables labels contains unicode and thus may not display well in Stata 13"

    Is there an easy way of removing unicode from variable labels or translating the to extended ascii so as to facilitate our data sharing experience?

    The manual mentions that "Before you use saveold, you can convert your string variables from the UTF-8 encoding to an extended ASCII encoding by using ustrto()". However it will be useful to have a process that is a bit more automatic.

    Thanks for the feedback!

    Javier

  • #2
    Javier, a few users have already requested -use14- privately. Obviously nothing can be done until the weekend, as I have to work my hours, and it is not going to be fast working without Stata 14 either.
    The task doesn't look like impossible though. I am glad the most odd thing in the new format was the introduction of 6-byte integers. But hey, it's not as odd as 7-byte integers! The rest was pretty much logically following from the previous format change. The format itself looks pretty closely resembling the older 117, but the unicode translations would likely limit the transfer from 14 to 13, not to earlier versions of Stata.

    I haven't heard anything on the maintenance of Stata 13 though. It was the case earlier that Stata 11 was taught to understand Stata 12 format in its last update, and I wonder if the same will be the case with Stata 13. If so, that would be very discouraging to reinvent the wheel, although still of a benefit for the users of Stata 10,11,12 perhaps.

    Saving from Stata 14 to an earlier version of Stata should be trivial. You just have to consent to losing the unsupported new features, and the rest goes into the file. Developing something like this by definition requires Stata 14, so will take some time. Try the head on approach: start with label save, then translate the whole resulting file from unicode to an ANSI page, then reapply labels and save the data.

    Note that Stat/Transfer was also simultaneously updated to support the Unicode conversions in version 13, and is always the shortest route for transferring data from package X to package Y.

    Best, Sergiy Radyakin (author of use13)

    Comment


    • #3
      Javier Escobal asked if there is an automated way to convert a Stata 14 dataset with Unicode to a Stata 13 dataset with extended ASCII. There is not, although as Javier pointed out, individual strings and names can be converted to extended ASCII with the ustrto() function.

      Sergiy Radyakin pointed out that shortly after Stata 12 started shipping, we updated Stata 11 to be able to understand the Stata 12 format. We will not be making such an update to Stata 13, mainly because of Unicode, but also because of Stata 14's support for more than 2 billion observations. That said, I suspect Sergiy will soon have use14 available.

      Now might be a good time to point out a little option we added to the saveold command -- version(). This option was added primarily because of feedback from Statalist. You can now save datasets in the format of the three most recent releases, not just Stata 13:

      Code:
      saveold filename, version(13)
      saveold filename, version(12)
      saveold filename, version(11)

      Comment


      • #4
        Nice! This would be a great addition to save10!
        Save10's idea is to save into the 10th format regardless of the version of Stata, whether it's v14 or v24, because it relies on the knowledge of the internal structure of the dataset. However it has only been tested in Stata 13 and is considered beta. Note that in a funny twist of fate it will create Stata 10 datasets even from Stata 9 , but to read them back you would need to install -use10-

        Unfortunately -save9- name was already taken in the SSC. So the commands save9, save8, saveto7, etc, built on the same logic never made it out, although were also written.
        The sad part of it is that that command by Marco Ercolani relies on Stata itself to be able to save to earlier format, so based on Alan Riley (StataCorp) 's description of the "moving wall" of supported earlier formats it is bound to stop working. (save9 already doesn't work when run in Stata 13).

        My command on the other hand relies on the understanding of the internal dataset structure and should continue to work in future versions of Stata as long as backward code compatibility is preserved. (There were no testing for unicode, as I don't have access to Stata 14 yet).

        Also the dataset specifications for versions prior to 6 are not anywhere to be found, but could be RE'd from the existing files.
        If anyone has authentic Stata 1.0 data files, please share, as I couldn't find anything prior to "g" files (from Stata 2). I suspect Stata 1 files were different, as reportedly the format didn't allow for strings. Please, don't share the program files unless you want to also transfer the license for it (and only if that license allowed for it).

        While -use14- sounds like an interesting idea, I think there is still not many datasets floating around in Stata 14's format, so the command will be most useful when there is a critical mass of data producers equipped with Stata 14. But currently there is a huge number of SPSS datasets using unicode, which were previously unavailable for Stata users, and can be exploited immediately. This makes -usespss- update a more important in my view. What do you think?

        Best, Sergiy Radyakin

        Comment


        • #5
          Note that there is a very, very big difference between save9 (Marco Ercolani, SSC) and Sergiy's programs.

          save9 was at best just a wrapper for save, old and otherwise just could not possibly do anything else. Hence it was a puzzle to some of us why it was ever made public. See the thread up and down from http://hsphsun3.harvard.edu/cgi-bin/...icle-1145.html in which this point was aired.

          Comment


          • #6
            Hi there, I need to open a STATA14 dta file on STATA 12. Is there any use14 ado file available yet?

            Comment


            • #7
              Beryl: The answer appears to be No. Ask your provider to save in a format you can read, as explained above.

              Comment

              Working...
              X