Error while unzipping and then converting .txt and .csv file format to .dta format

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

Error while unzipping and then converting .txt and .csv file format to .dta format

04 Jul 2022, 11:06

Code:

* specify locations of data sets;
*universe;
global abcd    "Users/Dropbox/BDD/data/source_a" 
global abcd09 "/Users/Dropbox/BDD/data/source_b" 
global generated        "/Users/Dropbox/BDD/data/generated" 
global tempdir             "temp"


*****************************************************************************;
*unzip universe data;
if 1==1{; //unzip allabcd and coverto to stata
    forvalues year=1980(1)2009{;
        display "shell 7z e $abcd/ABCD`year'.zip  -y -otemp";
        shell 7z e $abcd/ABCD`year'.zip  -y -otemp *.CSV *.TXT;
        };


forvalues year=94(1)99{;
        shell st  temp/ABCD`year'.TXT temp/abcd_univ_19`year'.dta /y;
        erase temp/ABCD`year'.TXT;
        };
    shell st  temp/ABCD2000.TXT temp/abcd_univ_2000.dta /y;
    erase temp/ABCD2000.TXT;
    shell st  temp/ABCD2001.TXT temp/abcd_univ_2001.dta /y;
    erase temp/ABCD2001.TXT;
    shell st  temp/ABCD2002.CSV temp/abcd_univ_2002.dta /y;
    erase temp/ABCD2002.CSV;
    forvalues year=2003(1)2009{;
        shell st temp/UNIVERSE`year'.CSV temp/abcd_univ_`year'.dta /y;
        erase temp/UNIVERSE`year'.CSV;
        };

After running the above command, I'm getting the following error. Can anyone kindly guide me on why it's happening? Would really appreciate it!

Code:

 if 1==1{; //unzip allabcd and coverto to stata
.         forvalues year=1980(1)2009{;
program error:  code follows on the same line as open brace
r(198);
.                 display "shell 7z e $abcd/ABCD`year'.zip  -y -otemp";
.                 shell 7z e $abcd/ABCD`year'.zip  -y -otemp *.CSV *.TXT;
.                 };
program error:  code follows on the same line as close brace
r(198);

end of do-file

r(198);

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#2

04 Jul 2022, 11:28

It appears that you have abruptly switched to using ; as your command terminator without telling Stata. So when it sees the ; after the {, it gets confused by that. To use ; as the command terminator you must first give the command

Code:

#delimit ;

When you are done with that section of the code and there are no more ; terminators, you must then switch back with the command

Code:

#delimit cr

That said, I don't see why that block of code is written with ; terminators. All of the commands are short and fit comfortably on one line, so this seems entirely unnecessary. It may just be simpler to remove all of the ; characters from the code there and not have any -#delimit- commands
1 like
Comment

Tariq Abdullah

Join Date: Apr 2021
Posts: 366

04 Jul 2022, 14:02

I've decided to one by one instead of importing and unzipping all of them at once since it feels a little cumbersome. I'm using import delimiter to upload .txt file on stata and will save this as .dta file.

Code:

 import delimited using abcd96.TXT, delimiters("^") varnames(1) rowrange(3)

Using the above command I'm getting the following data. I'm using this as a sample of data as it was suggested in this thread ( https://www.statalist.org/forums/for...ile-with-stata ). When I use import delimited the name of the variable gets lost and the categorical variable shows up as red - which I'm not sure is right or wrong. I've attached the image of how my data looks when I'm using import delimited using abcd96.TXT

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str167 v1
`"1996,1,0,1,"C04000026200",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",5,0,"   C0400",2,0,2,4,1,16.280,0,1,50,0,2,0,0,0,0.0,0.00,0.0,"'         
`"1996,1,0,1,"C05000003170",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",7,0,"   C0500",4,0,2,4,1,1.970,0,1,1100,0,2,0,0,0,0.0,0.00,0.0,"'        
`"1996,1,0,1,"C05000003400",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",7,0,"   C0500",4,0,2,4,1,2.113,0,1,1100,0,2,0,0,0,0.0,0.00,0.0,"'        
`"1996,1,0,1,"C05000015550",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",5,0,"   C0500",2,0,2,4,1,9.662,0,1,350,0,2,0,0,0,0.0,0.00,0.0,"'         
`"1996,1,0,1,"C05000019960",0,0,0,0,"000000000000",0.000,0.000,2,4,0,115,0,17,5,0,0,"00000",5,0,"   C0500",4,0,2,4,1,12.403,0,2,1100,0,2,0,0,0,0.0,0.00,0.0,"'

Attached Files

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#4

04 Jul 2022, 14:37

Where are these zip files originally from

EDIT: if they're public (that is, if we can directly copy them into our working directory, not from Dropbox but from a normal website), post the link.

Last edited by Jared Greathouse; 04 Jul 2022, 14:43.
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#5

04 Jul 2022, 14:43

https://www.nber.org/research/data/t...3s65oy4wvpr8NU

4. Highway Performance Monitoring System, Dept. of Transportation

They are available here. I know how to load the data for post-2011. But, the data after 1996 which I've been aiming for are uploaded in .txt format which is something I'm not aware of how to convert. The data from 1980-2008 are uploaded in shapefiles on the NBER website. The size is 13 GB. If you need any other information please let me know. Appreciate the kind response
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#6

04 Jul 2022, 14:47

Wait a minute, let's start back from the beginning. Let's start with the 1990s data. Which link (specifically) on link two am I meant to click? There's a lot here.

In this instance it may be possible to use your real data, but we must facilitate this correctly, especially since problems like this can be tricky. So which specific link from page 2, please?

EDIT: Me personally, I'm a big fan of using the REAL dataset I'm working with whenever possible, that way everyone can see the exact dataset i have, but you have to know how or be able to copy it into your working directory. It's a little more taxing than using dataex, but by all means, if these files are public,we can likely just work with them directly from the source.

Last edited by Jared Greathouse; 04 Jul 2022, 14:55.
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#7

04 Jul 2022, 15:03

https://www.nber.org/research/data/t...3s65oy4wvpr8NU

From the above NBER provided link the number 4 data is I'm working with. Which is provided by Matthew Tunrer. And, the part 1980-2008 is my concern. I need the part 1996-2008. But the data is given in .txt format which I can't convert to .dta format. All of them are public.
Data (1980-2008)

If you click the following link, they'll start downloading from NBER website in zipfile.

Data
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#8

04 Jul 2022, 15:09

Okay so before we can get started, use the copy command to copy it into a fresh new working directory. Then use the unzipfile command to unzip what downloads, and then we can go from there. Look at the help files for them if you've never used these.

And please, put all of the code for this in code delimiters so we can follow precisely what you did. I'll likely take a look after I get done eating this barbecue! I'll get back to you in a minute.
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#9

04 Jul 2022, 15:21

Please, ignore the first comment with which I started this thread. That command is actually unzipping the whole file at once and converting them simultaneously - that's too cumbersome to follow from my side. I'm doing the following thing now.

After downloading the dataset of

Data (1980-2008) I've unzipped the whole thing manually using 7zip(Windows) or The unarchiver/Archive utility(Mac). Then, I went to the subfolder of Universe_80_08. After going to that subfolder, I unzipped the HPMS1996. Then, I tried to import the HPMS96.TXT file (which is inside the unzipped file of HPMS1996) using the following command:

Code:

import delimited using HPMS96.TXT
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#10

04 Jul 2022, 15:43

Okay so this'll take a while. While we wait, either way, I'm confused about the point of all this. Your write

But the data is given in .txt format which I can't convert to .dta format

I follow that this big zip file has files you need. But why do you need to use shell to convert them to stata data? Why not just import them as a normal stata dataset, and save it accordingly?

It doesn't make sense. For example, let's say this file couldn't just be imported, and that we had to get it into our directory for some reason.

Code:

copy "https://raw.githubusercontent.com/jehangiramjad/tslib/master/tests/testdata/basque.csv" "basque.csv", replace

If I wanted to convert this to a Stata dataset, I would just do

Code:

import delim basque.csv, clear sa basquedata, clear

i wouldn't have any reason to use shell or anything more complicated than that. So I guess the real question, is aren't we making this a little harder than it likely should be? Isn't it feasible to just import the text files and save them as Stata data instead of fooling with shell?
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#11

04 Jul 2022, 15:48

Duplicate.
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#12

04 Jul 2022, 16:24

the issue is there is no csv file. the data for each year is stored in .txt format for each year from 1980-2008. I don’t know how to convert .txt format to stata format. there is no csv file inside the original zipped data. otherwise i would have uploaded the csv file on statd and done the thing you have kindly suggested
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#13

04 Jul 2022, 16:48

the issue is there is no csv file

So? I don't (and neither does Stata, for that matter) care if it's a csv or text file. Consider the following example I had yesterday

Code:

copy "http://qed.econ.queensu.ca/jae/datasets/ke001/kh-data.zip" "kh-data.zip", replace qui unzipfile kh-data.zip, replace erase kh-data.zip cls import delimited "kh-data.txt", clear // We can now save it as anything we'd like

This one takes about 5 seconds, so play with it if you'd wish. The point I'm trying to illustrate here, is that I don't know why you're trying to convert anything at all. CSVs, txt files, xlsx files, they all can easily be imported into Stata. If we have a csv file or a text file or whatever, there's 0 need for me to use shell or any similar command, I'll just import the file and move on with whatever I'm doing.

Like I guess that's my question, why are we discussing file conversion, I don't see why it's needed when Stata can handle both happily.
1 like
Comment
Tariq Abdullah

Join Date: Apr 2021

Posts: 366
#14

04 Jul 2022, 17:05

When I used import delimited the name of the variable gets lost and the categorical variable shows up as red - which I'm not sure is right or wrong. I've attached the image of how my data looks when I'm using import delimited using abcd96.TXT

The variable name doesn't show up and categorical variables are taking weird forms - that's why I was confused about using import delimited.
Attached Files
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#15

04 Jul 2022, 17:30

Are you importing using the first rows names as variable names? It makes sense why the variable is red, it has letters in it. Don't attach screenshots, always use dataex to show your stata data, never screenshots or images
1 like
Comment

Announcement

Error while unzipping and then converting .txt and .csv file format to .dta format

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment