Converting dataset into a standard text file with 100 positions per case

Daniela Kaiser

Join Date: Jul 2020

Posts: 21
#1

Converting dataset into a standard text file with 100 positions per case

08 Aug 2022, 10:50

Hello,

I have a dataset (with 18 string and numeric variables) which I need to turn into a standard text file, with 100 positions per record (row), with no tabs, commas, delimiters, or other punctuation. I wanted to know if I can use Stata to turn my dataset into a text file and to position each data element in exact record positions (described below):

(1-20) LAST NAME
(21-35) FIRST NAME
(36) MIDDLE INITIAL
(37-45) SOCIAL SECURITY NUMBER
(46-47) MONTH OF BIRTH
(48-49) DAY OF BIRTH
(50-53) YEAR OF BIRTH
(54-71) FATHER’S SURNAME
(72) AGE UNIT (at death)
(73-74) NUMBER OF AGE UNITS (at death)
(75) SEX
(76) RACE
(77) MARITAL STATUS
(78-79) STATE OF RESIDENCE
(80-81) STATE OF BIRTH
(82-91) CONTROL/ID NUMBER
(92-97) OPTIONAL USER DATA
(98-100) BLANK FIELD

Here are two examples of what a record with 100 positions following the position of each variable would look like:

WASHINGTON GEORGE T28516085604081731WASHINGTON 075121013059A536a58 060183

ROOSEVELT ELEANOR 26270217111231926ARIAS 029211NVCA62345 122296

Thank you!!
Tags: reshape, standard text file, syntax
Andrew Musau

Join Date: Oct 2014

Posts: 10219
#2

08 Aug 2022, 11:15

Seems like you are interested in string concatenation, although I do not know exactly what your reference to "positions" implies beyond the desired order. Extra spaces in case some field is missing or is not of maximum length? See the -concat()- function of egen.

Code:

help egen

Last edited by Andrew Musau; 08 Aug 2022, 11:19.
1 like
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#3

08 Aug 2022, 12:56

I think it is string concatenation AND formatting. For example, DAY OF BIRTH must occupy positions (character offsets from beginning) 48-49 in the resulting string. Hence if the day is 1, it is to be included as 01 to fill the two allocated positions and maintain the original numeric value.

What to do with missings (as Andrew already asked) or with too long string content is entirely dependent on the context, which I fail to see here, but is, perhaps, obvious to Daniela Kaiser .

It looks to me like an attempt to generate data input to some legacy system (from the punch cards era). In which case I believe the following conventions are applicable:
- string content that is too long is truncated at capacity;
- string content that is too short is padded at the right with spaces (spaces AFTER the content)
- numeric content that is too short is padded at the left with zeroes (zeroes BEFORE the content, aka leading zeroes)
- floating numbers are rounded to number of digits permitted for storage (e.g. 2 or other number of decimals) and padded with zeroes from both sides: e.g. "12.3989" will become "0012.40" if there are two max digits for decimals and field total width is 7 positions.
- etc

Concatenate the resulting formatted fields.
It makes sense to generate the 18 formatted string vars separately before concatenation to inspect and assert the formatting was done properly.

Best, Sergiy
1 like
Comment
Daniela Kaiser

Join Date: Jul 2020

Posts: 21
#4

08 Aug 2022, 15:45

Thank you both so much, concatenation definitely makes sense for this!

What I'm not sure about now is, before concatenating, how to impose the character limitations on each variable. For instance, I want to force the variables last name to be 20 characters long, so if a last name is 8 letters long I need it to have 12 spaces after it. Do you happen to know how I can do this?

Thanks!
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10219

08 Aug 2022, 17:20

Code:

clear
input str29(firstname middlename lastname SSN)
"Florence" "Griffith" "Joyner" "566-02-XXXX"
"Usain" "" "Bolt" "589-64-XXXX"
end

*(1-20) LAST NAME
*(21-35) FIRST NAME
*(36-36) MIDDLE INITIAL
*(37-45) SOCIAL SECURITY NUMBER

gen wanted= substr(lastname +"                    ", 1, `=20-1+1')+ ///
                substr(firstname +"                    ", 1, `=35-21+1')+ ///
                    substr(middlename +"                    ", 1, `=36-36+1')+ ///
                        substr(SSN +"                    ", 1, `=45-37+1')

Res.:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str45 wanted
"Joyner              Florence       G566-02-XX"
"Bolt                Usain           589-64-XX"
end

Last edited by Andrew Musau; 08 Aug 2022, 17:24.

Announcement

Converting dataset into a standard text file with 100 positions per case

Comment

Comment

Comment

Comment