Hi,
I have attendance data for many schools, stored in separate Excel files. Each file contains multiple sheets corresponding to different months—for example, data for February 2022 is stored in a sheet named "02_2022". Since the number of days varies by month, the number of columns also differs. For instance, January sheets span from column A to column AP, while April sheets go from column A to column AO. I’d like to use the sheet names to identify and handle these differences in column count.
In addition, some schools don’t have attendance data and instead contain a placeholder or dummy sheet, often named simply "M". These sheets also go up to column AP. I want to include these dummy sheets in the final dataset as well, appending them alongside the rest.
Currently I am using the following code. It is working but taking a long time (3 hours) to process my entire list of excels, I am hoping the approach above is more efficient. Feel free to share your thoughts on this as well.
I have attendance data for many schools, stored in separate Excel files. Each file contains multiple sheets corresponding to different months—for example, data for February 2022 is stored in a sheet named "02_2022". Since the number of days varies by month, the number of columns also differs. For instance, January sheets span from column A to column AP, while April sheets go from column A to column AO. I’d like to use the sheet names to identify and handle these differences in column count.
In addition, some schools don’t have attendance data and instead contain a placeholder or dummy sheet, often named simply "M". These sheets also go up to column AP. I want to include these dummy sheets in the final dataset as well, appending them alongside the rest.
Currently I am using the following code. It is working but taking a long time (3 hours) to process my entire list of excels, I am hoping the approach above is more efficient. Feel free to share your thoughts on this as well.
Code:
forvalues s = 1/`nschool' { local eno = `eiin`s'' /** Get sheet info **/ import excel using "***********directory to excel files**************", describe local nsheet = r(N_worksheet) forvalues i = 1/`nsheet' { local sheetname`i' = r(worksheet_`i') } forvalues i = 1/`nsheet' { clear local imported = 0 // Try column ranges from AP (42) down to AM (39) foreach col in AP AO AN AM { capture { import excel using "**********directory to excel files**********", /// sheet("`sheetname`i''") cellrange(A1:`col'2) firstrow allstring } if _rc == 0 { local imported = 1 continue, break } } if `imported' == 0 { display as error "‼️ EIIN `eno', sheet `sheetname`i'' could not be read (39–42 cols failed)." continue } if (`i' > 1 | `s' > 1) { append using `tmp' } save `tmp', replace } }
Comment