Reading fixed-format text files into STATA

Ijaz Ahmad

Join Date: Jan 2023

Posts: 17
#1

Reading fixed-format text files into STATA

05 Apr 2023, 15:37

Hi everyone,

I want to read multiple fixed format-text files for multiple years into Stata. I have already read this message, but it was too complicated for me to follow. Can anyone guide me how can I read 18 text files with the names as follows: state05, state06, state07, state08. ....state22. The data in the text file is arranged as follows:

city name 1-2 (left most two digits)
blanks 3-14
item code 15-17
amount 18-29
survey year 30-31
year of data 32-33
origin 34-35

Any advice would be highly appreciated.

Ijaz
Tags: None
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#2

05 Apr 2023, 16:17

It looks like the OP of the other thread actually has a working solution already. I don't think you've given us enough information to provide an answer (what you've provided is clearly not well-formatted example data). but in general you should loop through the path to each file that you want to load, then use the -infile- command to load the data. Next, write the data to the file system with the -save- command, and clear the data from memory to prepare for the next iteration of the loop. This will give you a set of .dta files, which you will presumably want to merge together later with the -merge- command, or append together with the -append- command. This will depending on how, exactly, you expect to combine the files.

Here is how the OP of the other thread does this:

Code:

local path "/Users/taylorwright/Dropbox/My RA Work/Abel's 2017 R&R/CBP Files/Dictionary 1" local folderList : dir "`path'" dirs "19*" local yr=74 * This loops through each of the folders (with the title being the year) in the directory foreach folder of local folderList { cd "`path'/19`yr'" local fileList : dir "`path'/19`yr'" files "RG029.CBP`yr'.T2*.txt" * This loops through each file in each folder (this is needed because data is split into geographical divisions foreach file of local fileList { infile using "/Users/taylorwright/Dropbox/My RA Work/Abel's 2017 R&R/Dictionaries/CBP_dict_1974-1986.dct", using(`file') save `file'.dta, replace clear } * This appends the data files from each geographical division into one data file for the entire year use RG029.CBP`yr'.T2I1.txt.dta foreach num of numlist 2/9 { append using RG029.CBP`yr'.T2I`num'.txt.dta, } cd "/Users/taylorwright/Dropbox/My RA Work/Abel's 2017 R&R/Cleaned Data/" save 19`yr'.dta, replace local yr=`yr'+1 clear }

The problem for the OP of the other thread was that some of the values that could not be read into memory as a Stata int because the numbers were too large to store in that type. I strongly recommend taking another look at this code and trying to understand what is happening.
1 like
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#3

05 Apr 2023, 16:36

If these files are public, post the URL to where we can get these files from, just as I did. Only when we have the exact files can we give the best service.
1 like
Comment
Ijaz Ahmad

Join Date: Jan 2023

Posts: 17
#4

06 Apr 2023, 00:15

Thank you, Mr. Schaefer and Mr. Greathouse, for your replies. Here is a link to the dataset: https://www.census.gov/programs-surv.../datasets.html
I need to process data for the years 2005 to 2021. I have just checked 2022 is not available.

The dataset's layout is given in the following at the following link: https://www.census.gov/programs-surv...e-layouts.html

I am writing the layout/dictionary here:

state_code v1 1-2 Numeric
blanks v2 3-14 zero-filled
item_code v3 15-17 Alpha Numeric
amount(in 1000$) v4 18-29 Numeric
survey_year v5 30-31 Numeric
year_of_data v6 32-33 Numeric
origin v7 34-35 Alpha Numeric

I am not sure if I have provided you with sufficient information.

Thanks again,

Last edited by Ijaz Ahmad; 06 Apr 2023, 00:20.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#5

06 Apr 2023, 06:06

You can also literally just do

Code:

import delim "https://www2.census.gov/programs-surveys/state/technical-documentation/file-layouts/public-use-file-layout.csv", clear

That way, we can literally have the file in our current dataframe. As a very general rule, directly importing your files from the URL is the best possible dataex you can have. I think it was Leonardo Guizzetti or Andrew Musau who also commented on my question, the one I linked to. Perhaps they have views on this. I do, but I gotta drive to Georgia Tech now.
Comment
Ijaz Ahmad

Join Date: Jan 2023

Posts: 17
#6

06 Apr 2023, 15:15

Thank you Jared Greathouse
Comment
Ijaz Ahmad

Join Date: Jan 2023

Posts: 17
#7

06 Apr 2023, 23:09

problem solved, thank you Jared Greathouse and Daniel Schaefer
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#8

07 Apr 2023, 14:25

Please do post the solution Ijaz Ahmad
Comment

Announcement

Reading fixed-format text files into STATA

Comment

Comment

Comment

Comment

Comment

Comment

Comment