Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reading fixed-format text files into STATA

    Hi everyone,

    I want to read multiple fixed format-text files for multiple years into Stata. I have already read this message, but it was too complicated for me to follow. Can anyone guide me how can I read 18 text files with the names as follows: state05, state06, state07, state08. ....state22. The data in the text file is arranged as follows:

    city name 1-2 (left most two digits)
    blanks 3-14
    item code 15-17
    amount 18-29
    survey year 30-31
    year of data 32-33
    origin 34-35

    Any advice would be highly appreciated.

    Ijaz

  • #2
    It looks like the OP of the other thread actually has a working solution already. I don't think you've given us enough information to provide an answer (what you've provided is clearly not well-formatted example data). but in general you should loop through the path to each file that you want to load, then use the -infile- command to load the data. Next, write the data to the file system with the -save- command, and clear the data from memory to prepare for the next iteration of the loop. This will give you a set of .dta files, which you will presumably want to merge together later with the -merge- command, or append together with the -append- command. This will depending on how, exactly, you expect to combine the files.

    Here is how the OP of the other thread does this:

    Code:
    local path "/Users/taylorwright/Dropbox/My RA Work/Abel's 2017 R&R/CBP Files/Dictionary 1"
    local folderList : dir "`path'" dirs "19*"
    local yr=74
    
    * This loops through each of the folders (with the title being the year) in the directory
    foreach folder of local folderList {
    cd "`path'/19`yr'"
    local fileList : dir "`path'/19`yr'" files "RG029.CBP`yr'.T2*.txt"
    * This loops through each file in each folder (this is needed because data is split into geographical divisions
    foreach file of local fileList {
        infile using "/Users/taylorwright/Dropbox/My RA Work/Abel's 2017 R&R/Dictionaries/CBP_dict_1974-1986.dct", using(`file')
        save `file'.dta, replace   
        clear
    }
    * This appends the data files from each geographical division into one data file for the entire year
    use RG029.CBP`yr'.T2I1.txt.dta
        foreach num of numlist 2/9 {
            append using RG029.CBP`yr'.T2I`num'.txt.dta,
        }
        cd "/Users/taylorwright/Dropbox/My RA Work/Abel's 2017 R&R/Cleaned Data/"
        save 19`yr'.dta, replace
    local yr=`yr'+1
    clear
    }
    The problem for the OP of the other thread was that some of the values that could not be read into memory as a Stata int because the numbers were too large to store in that type. I strongly recommend taking another look at this code and trying to understand what is happening.

    Comment


    • #3
      If these files are public, post the URL to where we can get these files from, just as I did. Only when we have the exact files can we give the best service.

      Comment


      • #4
        Thank you, Mr. Schaefer and Mr. Greathouse, for your replies. Here is a link to the dataset: https://www.census.gov/programs-surv.../datasets.html
        I need to process data for the years 2005 to 2021. I have just checked 2022 is not available.

        The dataset's layout is given in the following at the following link: https://www.census.gov/programs-surv...e-layouts.html

        I am writing the layout/dictionary here:

        state_code v1 1-2 Numeric
        blanks v2 3-14 zero-filled
        item_code v3 15-17 Alpha Numeric
        amount(in 1000$) v4 18-29 Numeric
        survey_year v5 30-31 Numeric
        year_of_data v6 32-33 Numeric
        origin v7 34-35 Alpha Numeric

        I am not sure if I have provided you with sufficient information.

        Thanks again,




        Last edited by Ijaz Ahmad; 06 Apr 2023, 00:20.

        Comment


        • #5
          You can also literally just do
          Code:
          import delim "https://www2.census.gov/programs-surveys/state/technical-documentation/file-layouts/public-use-file-layout.csv", clear
          That way, we can literally have the file in our current dataframe. As a very general rule, directly importing your files from the URL is the best possible dataex you can have. I think it was Leonardo Guizzetti or Andrew Musau who also commented on my question, the one I linked to. Perhaps they have views on this. I do, but I gotta drive to Georgia Tech now.

          Comment


          • #6
            Thank you Jared Greathouse

            Comment


            • #7
              problem solved, thank you Jared Greathouse and Daniel Schaefer

              Comment


              • #8
                Please do post the solution Ijaz Ahmad

                Comment

                Working...
                X