I have been running a loop to import csv files, keep some observations and save this reduced subset in .dta format.
The data in quarterly (1995_1 to 2016_1) with three types of files for each individual quarter (coupon, ticket and market) that I merge based on unique identifier.
Since the extracted raw data is large (271 GB), my loop instructs Stata to keep only the merged files (erase the interim files).
I get myraid errors while running the loop, most common are -
1. "The request could not be performed because of an I/O device error r(691);"
My individual files for each quarter are roughly 800 MB, 300 MB and 500 MB. I am using an external hard disk - Seagate (500 GB) Portable Drive. Going to the properties tab shows me I still have 80 GB of free space. Furthermore, I have deleted files to create space but the error persists. I have tried checking if there are issues with my harddisk but it seems unlikely. Also, I am able to import files from the hard disk without much issues at times, though I have come across "Data error (cyclic redundancy check)" at times too.
2. When I tried saving the files in a different harddisk, I get errors like "file coupon_1996_3.dta cannot be modified or erased; likely cause is read-only directory or file".
Can anyone please tell me ways to resolve these three issues???
Different starting points in the loop does not help either and I sooner or later come across one of the above errors. Thank you.
The data in quarterly (1995_1 to 2016_1) with three types of files for each individual quarter (coupon, ticket and market) that I merge based on unique identifier.
Since the extracted raw data is large (271 GB), my loop instructs Stata to keep only the merged files (erase the interim files).
I get myraid errors while running the loop, most common are -
1. "The request could not be performed because of an I/O device error r(691);"
My individual files for each quarter are roughly 800 MB, 300 MB and 500 MB. I am using an external hard disk - Seagate (500 GB) Portable Drive. Going to the properties tab shows me I still have 80 GB of free space. Furthermore, I have deleted files to create space but the error persists. I have tried checking if there are issues with my harddisk but it seems unlikely. Also, I am able to import files from the hard disk without much issues at times, though I have come across "Data error (cyclic redundancy check)" at times too.
2. When I tried saving the files in a different harddisk, I get errors like "file coupon_1996_3.dta cannot be modified or erased; likely cause is read-only directory or file".
Can anyone please tell me ways to resolve these three issues???
Different starting points in the loop does not help either and I sooner or later come across one of the above errors. Thank you.
Code:
cd "D:\nu\Data\" forvalues year=1996(1)2016 { //1995_2 forvalues quarter=1(1)4 { cd "Unzipped Data\" //coupon import delimited using "Origin_and_Destination_Survey_DB1BCoupon_`year'_`quarter'.csv", clear capture drop if seqnum==. capture drop if seqnum=="." capture duplicates drop itinid mktid seqnum,force isid itinid mktid seqnum list year quarter in 1/1 * tostring *, replace * capture destring itinid mktid seqnum, replace cd .. saveold "Stata\coupon_`year'_`quarter'.dta", replace //market cd "Unzipped Data\" import delimited using "Origin_and_Destination_Survey_DB1BMarket_`year'_`quarter'.csv", clear capture drop if mktid==. capture drop of mktid=="." capture duplicates drop itinid mktid,force isid itinid mktid keep itinid mktid mktcoupons year quarter origin originstatename dest /// deststatename bulkfare passengers mktfare mktdistance mktmilesflown list year quarter in 1/1 * tostring *, replace * capture destring mktid seqnum, replace cd .. saveold "Stata\market_`year'_`quarter'.dta", replace //ticket cd "Unzipped Data\" import delimited using "Origin_and_Destination_Survey_DB1BTicket_`year'_`quarter'.csv", clear capture duplicates drop itinid isid itinid keep itinid coupons year quarter origin originstatename roundtrip /// dollarcred farepermile passengers itinfare bulkfare distance milesflown list year quarter in 1/1 * tostring *, replace * capture destring itinid, replace cd .. saveold "Stata\ticket_`year'_`quarter'.dta", replace cd "Stata\" merge 1:m itinid using "market_`year'_`quarter'.dta",gen(_merge_ticket_market) force capture duplicates drop itinid mktid,force merge 1:m itinid mktid using "coupon_`year'_`quarter'.dta", gen(_merge_market_coupon) force keep if (_merge_ticket_market==3 & _merge_market_coupon==3) keep itinid mktid seqnum coupons year quarter origin originstatename /// dest deststatename opcarrier passengers fareclass distance /// dollarcred bulkfare mktfare itinfare keep if (origin=="ATL" & (dest=="DCA" | dest=="IAD" | dest=="BWI")) /// | ((origin=="DCA" | origin=="IAD" | origin=="BWI") & dest=="ATL") keep if dollarcred==1 // Fare value is credible keep if bulkfare==0 *capture keep if (mktfare>="1" & mktfare<="1000") keep if (mktfare>=1 & mktfare<=1000) *destring itinfare mktfare passenger, replace collapse (mean) itinfare mktfare (sum) passengers [fw=passengers], by(year quarter opcarrier fareclass) replace passengers=(passengers*10) encode opcarrier, gen(op_carrier) encode fareclass,gen(fare_class) reg passengers itinfare i.op_carrier i.fare_class saveold "merged_`year'_`quarter'.dta", replace erase "coupon_`year'_`quarter'.dta" erase "ticket_`year'_`quarter'.dta" erase "market_`year'_`quarter'.dta" cd .. } }
Comment