Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Saving and/or Deleting File Issues

    I have been running a loop to import csv files, keep some observations and save this reduced subset in .dta format.
    The data in quarterly (1995_1 to 2016_1) with three types of files for each individual quarter (coupon, ticket and market) that I merge based on unique identifier.
    Since the extracted raw data is large (271 GB), my loop instructs Stata to keep only the merged files (erase the interim files).

    I get myraid errors while running the loop, most common are -
    1. "The request could not be performed because of an I/O device error r(691);"
    My individual files for each quarter are roughly 800 MB, 300 MB and 500 MB. I am using an external hard disk - Seagate (500 GB) Portable Drive. Going to the properties tab shows me I still have 80 GB of free space. Furthermore, I have deleted files to create space but the error persists. I have tried checking if there are issues with my harddisk but it seems unlikely. Also, I am able to import files from the hard disk without much issues at times, though I have come across "Data error (cyclic redundancy check)" at times too.

    2. When I tried saving the files in a different harddisk, I get errors like "file coupon_1996_3.dta cannot be modified or erased; likely cause is read-only directory or file".


    Can anyone please tell me ways to resolve these three issues???
    Different starting points in the loop does not help either and I sooner or later come across one of the above errors. Thank you.

    Code:
    cd "D:\nu\Data\"
    forvalues year=1996(1)2016 { //1995_2     
        forvalues quarter=1(1)4        {
            cd "Unzipped Data\"
        //coupon    
            import delimited using "Origin_and_Destination_Survey_DB1BCoupon_`year'_`quarter'.csv",  clear
            capture drop if seqnum==.
            capture drop if seqnum=="."
            capture duplicates drop itinid mktid seqnum,force
            isid itinid mktid seqnum
            list year quarter in 1/1
        *    tostring *, replace
        *    capture destring itinid mktid seqnum, replace
            cd ..
            saveold "Stata\coupon_`year'_`quarter'.dta", replace
        //market
            cd "Unzipped Data\"
            import delimited using "Origin_and_Destination_Survey_DB1BMarket_`year'_`quarter'.csv",  clear
            capture drop if mktid==. 
            capture drop of mktid=="."
            capture duplicates drop itinid mktid,force
            isid itinid mktid
            keep itinid mktid mktcoupons year quarter origin originstatename dest ///
    deststatename bulkfare passengers mktfare mktdistance mktmilesflown
            list year quarter in 1/1
        *    tostring *, replace
        *    capture destring mktid seqnum, replace
            cd ..
            saveold "Stata\market_`year'_`quarter'.dta", replace
            //ticket
            cd "Unzipped Data\"
            import delimited using "Origin_and_Destination_Survey_DB1BTicket_`year'_`quarter'.csv",  clear
            capture duplicates drop itinid
            isid itinid
            keep itinid coupons year quarter origin originstatename roundtrip ///
    dollarcred farepermile passengers itinfare bulkfare distance milesflown
            list year quarter in 1/1
        *    tostring *, replace
        *    capture destring itinid, replace
            cd ..
            saveold "Stata\ticket_`year'_`quarter'.dta", replace
    
    cd "Stata\"
    merge 1:m itinid using "market_`year'_`quarter'.dta",gen(_merge_ticket_market) force
    capture duplicates drop itinid mktid,force
    merge 1:m itinid mktid using "coupon_`year'_`quarter'.dta", gen(_merge_market_coupon) force  
    keep if (_merge_ticket_market==3 & _merge_market_coupon==3)
    
    keep itinid mktid seqnum coupons year quarter origin originstatename ///
    dest deststatename opcarrier passengers fareclass distance ///
    dollarcred bulkfare mktfare itinfare
    
        keep if (origin=="ATL" & (dest=="DCA" | dest=="IAD" | dest=="BWI")) ///
    |     ((origin=="DCA" | origin=="IAD" | origin=="BWI") & dest=="ATL") 
    
    keep if dollarcred==1 // Fare value is credible
    keep if bulkfare==0
    *capture keep if (mktfare>="1" & mktfare<="1000") 
    keep if (mktfare>=1 & mktfare<=1000)
    
    *destring itinfare mktfare passenger, replace
    collapse (mean) itinfare mktfare (sum) passengers [fw=passengers], by(year quarter opcarrier fareclass) 
    
    replace passengers=(passengers*10)
    encode opcarrier, gen(op_carrier)
    encode fareclass,gen(fare_class)
    reg passengers itinfare i.op_carrier i.fare_class
    
    saveold "merged_`year'_`quarter'.dta", replace
    
    erase "coupon_`year'_`quarter'.dta"
    erase "ticket_`year'_`quarter'.dta"
    erase "market_`year'_`quarter'.dta"
    
    cd ..
            }
    }

  • #2
    There have been various postings over the years of operating systems not quite reacting quickly enough in extended series of commands involving disk access. One simple solution to try, which sometimes works, is to insert delays before/and or after some of the disk-access commands in Stata so as to give the machine time to catch up. The built-in -sleep- command will do this, e.g.:
    Code:
    save .....
    sleep 500 // wait for 500 milliseconds
    erase .....
    sleep 500
    I pick 500ms only for illustration; you might well have success with a shorter delay.

    Comment


    • #3
      Without seeing your actual data, it is quite hard to understand what you try to achieve; my best guess on the error messages you describe is that you USB drive's write cache does not work well with Stata writing no less than 160 data files immediately after each other.

      I would try to avoid physically saving so many files to a USB device; instead, try to replace each "saveold" but the very last in your loop with saving a temporary file (see help tempfile).

      All you need to do is start your code with
      Code:
      tempfile coupon market ticket
      then do everything as you did before, but instead of writing
      Code:
      *...
      saveold "Stata\coupon_`year'_`quarter'.dta", replace
      *...
      saveold "Stata\market_`year'_`quarter'.dta", replace
      *...
      saveold "Stata\ticket_`year'_`quarter'.dta", replace
      *...
      merge 1:m itinid using "market_`year'_`quarter'.dta",gen(_merge_ticket_market) force
      *...
      merge 1:m itinid mktid using "coupon_`year'_`quarter'.dta", gen(_merge_market_coupon) force
      *...
      you code
      Code:
      *...
      saveold "`coupon'", replace
      *...
      saveold "`market'", replace
      *...
      saveold "`ticket'", replace
      *...
      merge 1:m itinid using "`market'",gen(_merge_ticket_market) force
      *...
      merge 1:m itinid mktid using "`coupon'", gen(_merge_market_coupon) force
      *...
      To avoid errors, I recommend to avoid -capture- and -force- options wherever possible. You should carefully review your code and possibly remove/rework these parts.

      Regards
      Bela

      Comment


      • #4
        HI Mike,
        Thanks for the suggestion. The sleep recommendation does not seem to be working, and I still get "The request could not be performed because of an I/O device error r(691);"
        Also, I compressed the files on my hard disk and have more than 200 GB of free space.
        Not sure why this is happening.

        Comment


        • #5
          Thank you Bela.
          Using the tempfile command certainly has reduced the frequency of the above three errors I get.
          Nonetheless, I am still having trouble running the loop over select few years.

          I am not sure why I still see "Data error (cyclic redundancy check) r(691); " or "The request could not be performed because of an I/O device error r(691);"
          Would redownloading the data files help ???

          Comment

          Working...
          X