Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to read PISA DATA in STATA by writing a do-file?

    Dear all,
    I have encountered a problem of reading PISA data, I am very very new to this data set, and more specifically, it is my first time trying to use ASCII Data,
    As far as I know the Data set is used with the control file which assign the vars name and label which is written in SPSS.
    I would like to ask the following questions:

    Most important: 1. How to read the data in stata
    2. How to identify which row of data belongs to which vars?
    3. How to assign the data with its responsive name?
    4. How to import only few vars that I concern but not the whole data set?

    I have searched that there was similar question (included below), in which they mentioned about that there is a tedious way to translation the syntax from SPSS to STATA, I would be appreciate if someone could tell me what is it.

    Many Thanks!
    Colleagues, out of personal interest, I wanted to download PISA 2012 data (http://pisa2012.acer.edu.au/downloads.php). The data is disseminated in TXT format

  • #2
    Chris, a while ago Richard Williams have done a great effort in transferring the PISA files to Stata format and making them available to the users. Ironically, it is the same thread you are referring to, but the files appeared a few pages later, so read thoroughly, the details are in this post , but read the whole thread for context.

    That should probably answer your #1, #2, and #4. Your #3 is not clear to me.

    The tedious step is adjusting the SPSS or SAS syntax into Stata syntax for reading-in ASCII files. (counting various fixed positions in the ASCII file for thousands of variables). If the data is already in Stata format (.dta) - you don't need to care about it.

    Best, Sergiy Radyakin

    Comment


    • #3
      Sergiy, Thanks for your answer, I actually read Richard post, but I do need to know how to write the do-file to read the PISA data instead of downloading Richard's ready-data set, it is because my supervisor won't be pleased to see that I use the unofficial document. Therefore, it is necessary for me to know which row of data in the .txt file belong to which var.

      Comment


      • #4
        Personally I would try to get a new supervisor! Making someone spend hours, maybe days, writing a program possibly filled with errors, is not the best idea in the world, at least in my opinion.

        But if you must do it -- If you Google around, you can find some guides for translating spss syntax into stata syntax; for example

        http://www.ats.ucla.edu/stat/stata/f...d_to_stata.htm
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 19.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          Chris, I am sure your supervisor will not be pleased if you just retype the complete SPSS syntax into a Stata do file. The reason is that such large scripts are commonly generated mechanically, perhaps, by a program like Stat/Transfer, and a human operator will likely make dozens of mistakes if he/she tries to manually process it anyway. This is not a mission impossible, but it will probably take you a significant time to do this, so confirm with your supervisor this is really intended. Finally ask yourself whether your supervisor will actually ever read the thousands of lines of boring mechanical code. Perhaps he (she) is more interested in your research question, estimation strategy and interpretation of results.

          There are a few professors following this forum. They can weigh in and give a proper comment on the educational value of such an undertaking.

          You can open one of the SPSS syntax files like here, then change it to a Stata dictionary file line-by-line, e.g. variable SCHOOLID occupies positions 25 through 31 (both positions inclusive) and is of string type. The following is a reference for the Stata's command infix, that allows you utilize a dictionary to read the fixed file format.

          After successfully reading the data file, you are half-way through. The next step are labels for variables and values. Stata does not support value labels for strings. (Having a lack of string variables' labels is one of the last limitations of Stata's dataset format after adding long strings in Stata 13 and unicode in Stata 14, the other one being lack of forward compatibility of file formats). So that eliminates the huge lists of countries/cities/languages, like
          "CZE1132" unless you want to be completely fanatical about them and encode the strings into (your own) codes, then label them with original values + original labels combined. Add a few formatting decorations and you are done. Note that encoding the string variables into numerical codes will make them incompatible with some programs already developed to process PISA data for other packages, so if you plan to port them to Stata as well, you will need to account for the differences in variable types.

          If you actually go through this, you will definitely have something unique and valuable, like a handwritten copy of Encyclopedia Britannica. On the other hand, if you do it right, and two-three persons review your code and find it accurate, this could be a valuable addition to the PISA collection of scripts to input the data into the statistical packages. A few documents on PISA mention existence of web-based tools to extract scripts including Stata, but I haven't seen them - SPSS and SAS are available. If anyone knows better, please comment.

          A while ago I have written a program to automate this process, but [unfortunately for you] it works in the opposite direction and converts Stata labels into SPSS labels (with some assumptions and limitations of course): http://radyakin.org/transfer/stataspss/stataspss.htm

          If you expect there is going to be any repetition (e.g. if the update to PISA data/scripts identifies a problem with an earlier data release) automating the process is crucial.

          Hope this helps, Sergiy Radyakin.

          Comment


          • #6
            thx for everyone concern, I have communicated with my supervisor and seems he agrees the way of 'Spss->stat/transfer->stata'!

            Comment

            Working...
            X