I am trying to figure out how to loop through a set of folders/directories that each contains an Excel file with multiple sheets, import each sheet, and then save each sheet as a dta file in a separate set of directories. I have 9 directories total (I'm using just 3 in the code below), and each Excel sheet has 9 tabs. So, if there is one Excel file each in a set of folders called School 1, School 2, and School 3 within the parent folder .../Raw Data/Attendance, I need to save the dta files from each Excel file/sheet in corresponding folders, also named School 1, School 2, School 3, in .../Working Data/Attendance.
I have code that works to loop through Excel files in a single directory, import each sheet and manipulate the data how I want it, and then save each sheet as a dta file in a single directory. What I can't figure out is how to loop through multiple directories, and then save to a separate set of multiple directories. Below is the code I have now; so far I have only tried looping through multiple directories to import the Excel files, but not saving to multiple directories. This code gives me an "invalid syntax" error. In the three nested/inner loops, the only thing I have changed from my working code is that I replaced "`source_path'/`f'" with "./`f'", but both give the same error. I have truncated the paths in the code below but they are correct in my actual code. The directories have the same names in the source folder and in the destination folder so I think I only need one local macro for those folder names.
I'm not sure if this is on the right track - if it's just a small error in my syntax or if I have bigger issues with my code. I haven't been able to find good examples of saving to multiple directories within a loop, so I'm not sure where to start with that part.
After this step, I then need to merge the dta files in each directory (i.e. merge all the data from all sheets from one single Excel file) and save that merged data as a single dta file, and then loop through the dta file directories again to append all of the merged dta files. I am doing the merging and appending as separate steps because the raw data is quite messy and I will need to check for incompatibilities with -precombine- before combining anything.
I've seen -filelist- suggested as a way to simplify the task of looping through directories, but despite reading the help and some posts on it, I don't quite understand how it works. I also have some directories in the parent folder that I do *not* want to loop through, and I'm not sure if filelist lets you exclude directories.
I have code that works to loop through Excel files in a single directory, import each sheet and manipulate the data how I want it, and then save each sheet as a dta file in a single directory. What I can't figure out is how to loop through multiple directories, and then save to a separate set of multiple directories. Below is the code I have now; so far I have only tried looping through multiple directories to import the Excel files, but not saving to multiple directories. This code gives me an "invalid syntax" error. In the three nested/inner loops, the only thing I have changed from my working code is that I replaced "`source_path'/`f'" with "./`f'", but both give the same error. I have truncated the paths in the code below but they are correct in my actual code. The directories have the same names in the source folder and in the destination folder so I think I only need one local macro for those folder names.
Code:
* Set up source path for directories with Excel files with attendance data local source_path "C:/Users/.../Data/Raw Data/Attendance" * Destination path for saving dta files local dta_path "C:/Users/.../Data/Working Data/Attendance" cd `"`source_path'"' * Create macro with the list of the folders with raw data and with working data local folder_list `" "School 1" "School 2" "School3" "' foreach folder of local folder_list { cd `source_path'/`folder' local files: dir "." files "*.xlsx" foreach f of local files { // LOOP OVER EXCEL FILES & CAPTURE THE NAMES OF THE TABS IN THE FILE display "./`f'" import excel using "./`f'", describe local n_sheets `r(N_worksheet)' forvalues i = 1/`n_sheets' { local tab`i' `r(worksheet_`i')' } // NOW LOOP OVER THE TABS, IMPORTING THEM ONE AT A TIME forvalues i = 1/`n_sheets' { if strpos("`tab`i''", "October") == 0 { // EXCLUDE THE OCTOBER SHEET - no schools started before November this year display "./`f'" import excel using "./`f'", sheet("`tab`i''") cellrange(A2) allstring firstrow clear gen school = subinstr("`f'",".xlsx","",.) // IDENTIFY WHERE DATA CAME FROM gen month = "`tab`i''" rename Session*_??? Session* // HARMONIZE VARNAMES ACROSS TABS drop if StudentLastName == "0" | StudentLastName == "" // DROP EXTRANEOUS OBSERVATIONS //Save each sheet as a dta file local filename = subinstr("`f'",".xlsx","",.) save "`dta_path'/`filename'_`tab`i''.dta", replace } } } }
After this step, I then need to merge the dta files in each directory (i.e. merge all the data from all sheets from one single Excel file) and save that merged data as a single dta file, and then loop through the dta file directories again to append all of the merged dta files. I am doing the merging and appending as separate steps because the raw data is quite messy and I will need to check for incompatibilities with -precombine- before combining anything.
I've seen -filelist- suggested as a way to simplify the task of looping through directories, but despite reading the help and some posts on it, I don't quite understand how it works. I also have some directories in the parent folder that I do *not* want to loop through, and I'm not sure if filelist lets you exclude directories.
Comment