Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error while unzipping and then converting .txt and .csv file format to .dta format

    Code:
    * specify locations of data sets;
    *universe;
    global abcd    "Users/Dropbox/BDD/data/source_a" 
    global abcd09 "/Users/Dropbox/BDD/data/source_b" 
    global generated        "/Users/Dropbox/BDD/data/generated" 
    global tempdir             "temp"
    
    
    *****************************************************************************;
    *unzip universe data;
    if 1==1{; //unzip allabcd and coverto to stata
        forvalues year=1980(1)2009{;
            display "shell 7z e $abcd/ABCD`year'.zip  -y -otemp";
            shell 7z e $abcd/ABCD`year'.zip  -y -otemp *.CSV *.TXT;
            };
    
    
    forvalues year=94(1)99{;
            shell st  temp/ABCD`year'.TXT temp/abcd_univ_19`year'.dta /y;
            erase temp/ABCD`year'.TXT;
            };
        shell st  temp/ABCD2000.TXT temp/abcd_univ_2000.dta /y;
        erase temp/ABCD2000.TXT;
        shell st  temp/ABCD2001.TXT temp/abcd_univ_2001.dta /y;
        erase temp/ABCD2001.TXT;
        shell st  temp/ABCD2002.CSV temp/abcd_univ_2002.dta /y;
        erase temp/ABCD2002.CSV;
        forvalues year=2003(1)2009{;
            shell st temp/UNIVERSE`year'.CSV temp/abcd_univ_`year'.dta /y;
            erase temp/UNIVERSE`year'.CSV;
            };
    After running the above command, I'm getting the following error. Can anyone kindly guide me on why it's happening? Would really appreciate it!

    Code:
     if 1==1{; //unzip allabcd and coverto to stata
    .         forvalues year=1980(1)2009{;
    program error:  code follows on the same line as open brace
    r(198);
    .                 display "shell 7z e $abcd/ABCD`year'.zip  -y -otemp";
    .                 shell 7z e $abcd/ABCD`year'.zip  -y -otemp *.CSV *.TXT;
    .                 };
    program error:  code follows on the same line as close brace
    r(198);
    
    end of do-file
    
    r(198);

  • #2
    It appears that you have abruptly switched to using ; as your command terminator without telling Stata. So when it sees the ; after the {, it gets confused by that. To use ; as the command terminator you must first give the command
    Code:
    #delimit ;
    When you are done with that section of the code and there are no more ; terminators, you must then switch back with the command
    Code:
    #delimit cr
    That said, I don't see why that block of code is written with ; terminators. All of the commands are short and fit comfortably on one line, so this seems entirely unnecessary. It may just be simpler to remove all of the ; characters from the code there and not have any -#delimit- commands

    Comment


    • #3

      I've decided to one by one instead of importing and unzipping all of them at once since it feels a little cumbersome. I'm using import delimiter to upload .txt file on stata and will save this as .dta file.

      Code:
       import delimited using abcd96.TXT, delimiters("^") varnames(1) rowrange(3)
      Using the above command I'm getting the following data. I'm using this as a sample of data as it was suggested in this thread ( https://www.statalist.org/forums/for...ile-with-stata ). When I use import delimited the name of the variable gets lost and the categorical variable shows up as red - which I'm not sure is right or wrong. I've attached the image of how my data looks when I'm using import delimited using abcd96.TXT

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str167 v1
      `"1996,1,0,1,"C04000026200",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",5,0,"   C0400",2,0,2,4,1,16.280,0,1,50,0,2,0,0,0,0.0,0.00,0.0,"'         
      `"1996,1,0,1,"C05000003170",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",7,0,"   C0500",4,0,2,4,1,1.970,0,1,1100,0,2,0,0,0,0.0,0.00,0.0,"'        
      `"1996,1,0,1,"C05000003400",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",7,0,"   C0500",4,0,2,4,1,2.113,0,1,1100,0,2,0,0,0,0.0,0.00,0.0,"'        
      `"1996,1,0,1,"C05000015550",0,0,0,0,"000000000000",0.000,0.000,2,1,0,0,0,7,4,0,0,"00000",5,0,"   C0500",2,0,2,4,1,9.662,0,1,350,0,2,0,0,0,0.0,0.00,0.0,"'         
      `"1996,1,0,1,"C05000019960",0,0,0,0,"000000000000",0.000,0.000,2,4,0,115,0,17,5,0,0,"00000",5,0,"   C0500",4,0,2,4,1,12.403,0,2,1100,0,2,0,0,0,0.0,0.00,0.0,"'
      Attached Files

      Comment


      • #4
        Where are these zip files originally from

        EDIT: if they're public (that is, if we can directly copy them into our working directory, not from Dropbox but from a normal website), post the link.
        Last edited by Jared Greathouse; 04 Jul 2022, 14:43.

        Comment


        • #5
          https://www.nber.org/research/data/t...3s65oy4wvpr8NU

          4. Highway Performance Monitoring System, Dept. of Transportation

          They are available here. I know how to load the data for post-2011. But, the data after 1996 which I've been aiming for are uploaded in .txt format which is something I'm not aware of how to convert. The data from 1980-2008 are uploaded in shapefiles on the NBER website. The size is 13 GB. If you need any other information please let me know. Appreciate the kind response

          Comment


          • #6
            Wait a minute, let's start back from the beginning. Let's start with the 1990s data. Which link (specifically) on link two am I meant to click? There's a lot here.


            In this instance it may be possible to use your real data, but we must facilitate this correctly, especially since problems like this can be tricky. So which specific link from page 2, please?


            EDIT: Me personally, I'm a big fan of using the REAL dataset I'm working with whenever possible, that way everyone can see the exact dataset i have, but you have to know how or be able to copy it into your working directory. It's a little more taxing than using dataex, but by all means, if these files are public,we can likely just work with them directly from the source.
            Last edited by Jared Greathouse; 04 Jul 2022, 14:55.

            Comment


            • #7
              https://www.nber.org/research/data/t...3s65oy4wvpr8NU

              From the above NBER provided link the number 4 data is I'm working with. Which is provided by Matthew Tunrer. And, the part 1980-2008 is my concern. I need the part 1996-2008. But the data is given in .txt format which I can't convert to .dta format. All of them are public.If you click the following link, they'll start downloading from NBER website in zipfile.

              Data


              Comment


              • #8
                Okay so before we can get started, use the copy command to copy it into a fresh new working directory. Then use the unzipfile command to unzip what downloads, and then we can go from there. Look at the help files for them if you've never used these.


                And please, put all of the code for this in code delimiters so we can follow precisely what you did. I'll likely take a look after I get done eating this barbecue! I'll get back to you in a minute.

                Comment


                • #9
                  Please, ignore the first comment with which I started this thread. That command is actually unzipping the whole file at once and converting them simultaneously - that's too cumbersome to follow from my side. I'm doing the following thing now.

                  After downloading the dataset of

                  Data (1980-2008) I've unzipped the whole thing manually using 7zip(Windows) or The unarchiver/Archive utility(Mac). Then, I went to the subfolder of Universe_80_08. After going to that subfolder, I unzipped the HPMS1996. Then, I tried to import the HPMS96.TXT file (which is inside the unzipped file of HPMS1996) using the following command:

                  Code:
                   import delimited using HPMS96.TXT 





                  Comment


                  • #10
                    Okay so this'll take a while. While we wait, either way, I'm confused about the point of all this. Your write
                    But the data is given in .txt format which I can't convert to .dta format
                    I follow that this big zip file has files you need. But why do you need to use shell to convert them to stata data? Why not just import them as a normal stata dataset, and save it accordingly?



                    It doesn't make sense. For example, let's say this file couldn't just be imported, and that we had to get it into our directory for some reason.
                    Code:
                    copy "https://raw.githubusercontent.com/jehangiramjad/tslib/master/tests/testdata/basque.csv" "basque.csv", replace
                    If I wanted to convert this to a Stata dataset, I would just do
                    Code:
                    import delim basque.csv, clear
                    
                    sa basquedata, clear
                    i wouldn't have any reason to use shell or anything more complicated than that. So I guess the real question, is aren't we making this a little harder than it likely should be? Isn't it feasible to just import the text files and save them as Stata data instead of fooling with shell?

                    Comment


                    • #11
                      Duplicate.

                      Comment


                      • #12
                        the issue is there is no csv file. the data for each year is stored in .txt format for each year from 1980-2008. I don’t know how to convert .txt format to stata format. there is no csv file inside the original zipped data. otherwise i would have uploaded the csv file on statd and done the thing you have kindly suggested

                        Comment


                        • #13
                          the issue is there is no csv file
                          So? I don't (and neither does Stata, for that matter) care if it's a csv or text file. Consider the following example I had yesterday
                          Code:
                          copy "http://qed.econ.queensu.ca/jae/datasets/ke001/kh-data.zip" "kh-data.zip", replace
                          
                          qui unzipfile kh-data.zip, replace
                          
                          erase kh-data.zip
                          
                          cls
                          
                          import delimited "kh-data.txt", clear
                          
                          // We can now save it as anything we'd like
                          This one takes about 5 seconds, so play with it if you'd wish. The point I'm trying to illustrate here, is that I don't know why you're trying to convert anything at all. CSVs, txt files, xlsx files, they all can easily be imported into Stata. If we have a csv file or a text file or whatever, there's 0 need for me to use shell or any similar command, I'll just import the file and move on with whatever I'm doing.

                          Like I guess that's my question, why are we discussing file conversion, I don't see why it's needed when Stata can handle both happily.

                          Comment


                          • #14
                            When I used import delimited the name of the variable gets lost and the categorical variable shows up as red - which I'm not sure is right or wrong. I've attached the image of how my data looks when I'm using import delimited using abcd96.TXT

                            The variable name doesn't show up and categorical variables are taking weird forms - that's why I was confused about using import delimited.
                            Attached Files

                            Comment


                            • #15
                              Are you importing using the first rows names as variable names? It makes sense why the variable is red, it has letters in it. Don't attach screenshots, always use dataex to show your stata data, never screenshots or images

                              Comment

                              Working...
                              X