Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing fixed format data

    I cannot seem to import a fixed format data file. Data is from CDC National Vital Statistics System (downloadable as zip files under the Mortality Multiple Cause Files heading). Part of the challenge is the data file lacks a a true (usable) dictionary, but does have a pdf that can translate into a dictionary (see documentation stored here, such as this specific pdf). I have tried guidance posted on the Stata FAQ, as well as several posts on Statlist (e.g., 1, 2), but cannot seem to make anything work.

    I've tried many permutations of code like this:
    Code:
     infix dictionary  {
        _column(19)    int        record    %1f
        _column(20)    int        resstat    %1f
        _column(63)    int        edu        %1f
        _column(64)    int        eduflag    %1f
        _column(65)    int        month    %2f
        _column(69)    int        sex        %1f
        _column(70)    int        age        %4f
        _column(74)    int        ageflag    %1f
        _column(75)    int        ager1    %2f
        _column(77)    int        ager2    %2f
        _column(79)    int        ager3    %2f        
    }
    
    infix using dictionary , using(mort2021us.txt)
    As well as a more simple version like this:
    Code:
    infix using mort2021us.txt
    infix dictionary {
        record    19
        resstat    20
        edu        63
        eduflag    64
    Including switching commands around, changing from infix to infile to import, using help files (e.g., this one). I've gotten numerous errors, including "file does not contain dictionary" and "dictionary is unrecognized." I've tried embedding dictionary code in my do file, as well as saving it as it's own txt file in the same directory. In all honesty,

    I'm clearly missing some key component and I'm sure it's my (user) error. Any help would be greatly appreciated!

    Sarah

  • #2
    I rarely use -infix- or -infile-, and I have always found the documentation for them to be distinctly opaque, so I feel your pain. Here's something that worked for me using your dictionary file:

    Make your dictionary file as you have, but without the word "infix," and save it as (say) "mydictionary.dct"
    Code:
    dictionary using "mort2021us.txt"  {
        _column(19)    int        record     %1f
        _column(20)    int        resstat    %1f
        _column(63)    int        edu        %1f
        _column(64)    int        eduflag    %1f
        _column(65)    int        month      %2f
        _column(69)    int        sex        %1f
        _column(70)    int        age        %4f
        _column(74)    int        ageflag    %1f
        _column(75)    int        ager1      %2f
        _column(77)    int        ager2      %2f
        _column(79)    int        ager3      %2f        
    }
    Tell Stata to read your raw data, employing that dictionary file by name, with the following -infile- command:
    Code:
    infile using "mydictionary.dct"

    Comment


    • #3
      Thank you Mike! This is a new one for me too. When running the code you provided, I get an error that <command dictionary is unrecognized>. If instead I use
      Code:
      infix dictionary {
      or
      Code:
      infix dictionary using "mort2021us.txt"
      I get errors (<invalid name> and <using invalid varname> respectively.

      If I just save the dictionary do file with a .dct extension, it doesn't exist when I try to use it with the
      Code:
      infile using dictionary.dct using mort2021us.txt
      .. this returns the error <file does not contain dictionary>

      At this point, I loathe to say, I might end up using python for the data import, then saving it as a cohesive file, and then importing it to Stata. I guess after 10 years of using Stata, I was bound to run into at least one flaw.

      Comment


      • #4
        Two thoughts: I'm using Stata 15.1, but I can't imagine that would be an issue here, since I believe that all this infix/infile stuff has been the same for many versions. Instead, I'd wonder if there's some small confusion between us. Let me be more explicit with an example that just worked for me. Can you try this and report back?

        Save this example data file to mort2021us.txt in YourDirectory

        Code:
        01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
        01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
        01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
        Save this to mydictionary.txt in YourDirectory

        Code:
        dictionary using mort2021us.txt  {
            _column(19)    int        record    %1f
            _column(20)    int        resstat   %1f
            _column(63)    int        edu       %1f
            _column(64)    int        eduflag   %1f
            _column(65)    int        month     %2f
            _column(69)    int        sex       %1f
            _column(70)    int        age       %4f
            _column(74)    int        ageflag   %1f
            _column(75)    int        ager1     %2f
            _column(77)    int        ager2     %2f
            _column(79)    int        ager3     %2f          
        }
        Run these commands.
        Code:
        cd YourDirectory
        infile using mydictionary.dct

        Comment


        • #5
          Thank you Mike! This worked! I think the issue was my txt files were actually saving as rtf files and your second example drew my attention to that. Thank you again!

          Comment

          Working...
          X