Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error in dictionary file

    I have a dataset in .txt, which I have to translate with a dictionary file. I have written the dictionary file according to instructions from the owners of the dataset. However, there seems to be a problem with one variable (that I know of, but there might be other problems too). The variable STDIND presents with missing observations (about 80% of all observations), even though it should not. If I run the code in R (written in R language) it works, without giving me missing values, so there must be a problem with my Stata code (and I have to use Stata). This is how I wrote the dictionary file:
    Code:
    dictionary {
    _column(1) int    ANNO    %4f "ANNO"
    _column(5) int    TRIM    %1f "TRIM"
    _column(6) int    REG    %2f "REG"
    _column(8) int    numcff    %2f "SG4"
    (...)
    _column(587) int    STDFAM    %6f "STDFAM"
    _column(593) int    STDIND    %6f "STDIND"
    _column(599) int    NN2    %1f "NN2"
    _column(600) int    RPN2    %1f "RPN2"
    _column(601) int    TF    %2f "TF"
    _column(603) int    TN2    %1f "TN2"
    _column(604) int    F0_14    %1f "F0_14"
    _column(605) int    CP0_7    %1f "CP0_7"
    _column(606) int    CITTAD    %1f "CITTAD"
    _column(607) int    WAVQUA    %1f "WAVQUA"
    _column(608) int    nasita    %1f "SG13"
    _column(609) int    citita    %1f "SG16"
    _column(610) int    annres    %3f "SG18"
    _column(613) int    NASSES    %3f "NASSES"
    _column(616) int    CITSES    %3f "CITSES"
    _column(619) int    RAPSES    %3f "RAPSES"
    }
    And this is the code I used to apply the dictionary file (2005_Q2_dict.dct is the dictionary file, sta_2005_2.txt is the dataset in txt):

    Code:
    clear
    infile using "$PathDict/2005_Q2_dict.dct", using("$Path05Q2/sta_2005_2.txt")
    Is there a problem with the code? Am I doing something wrong, or missing something?

  • #2
    To start on solving this problem, try changing
    Code:
    _column(593) int    STDIND    %6f "STDIND"
    to
    Code:
    _column(593) str6   STDIND    %6s "STDIND"
    to read STDIND as a string. Then, look at the values of STDIND to see what does not look like a number. Perhaps
    Code:
    tab STDIND if real(STDIND)==.
    will help you find values of STDIND that Stata translates into missing values when converting text to numeric values.

    Comment


    • #3
      Thank you so much!! This works.

      Comment


      • #4
        William Lisowski could you tell me how to determine whether to choose int or str, when I have numeric values (if there is some sort of criteria)?

        Comment


        • #5
          This works.
          I don't think you understood what is suggested in post #2.

          Of course reading STDIND as a string "works" in the sense that it doesn't create any missing values when the contents of columns 593-598 contain something that Stata cannot interpret as a number. But if STDIND is supposed to have numeric values, then having it as a string does not "work" if you are need to use it as a variable in your analysis.

          If STDIND "has numeric values" then you must get it to be a numeric variable. It is most convenient to read it as a number, but if that is not possible, you will have to read it as a string and then perhaps change certain values before converting the string to a number. How you do that depends on what values it has. What were the results when you ran
          Code:
          tab STDIND if real(STDIND)==.
          You also wrote in post #1
          there seems to be a problem with one variable (that I know of, but there might be other problems too)
          You will learn of all potential problems by running
          Code:
          misstable summarize, all
          after your data has been read to find out what numeric variables have missing values, and a few details about the values each takes.

          Comment


          • #6
            Yes I understood what you meant, but thank you for the suggested command!

            Comment

            Working...
            X