Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop through multiple directories and save to another set of multiple directories

    I am trying to figure out how to loop through a set of folders/directories that each contains an Excel file with multiple sheets, import each sheet, and then save each sheet as a dta file in a separate set of directories. I have 9 directories total (I'm using just 3 in the code below), and each Excel sheet has 9 tabs. So, if there is one Excel file each in a set of folders called School 1, School 2, and School 3 within the parent folder .../Raw Data/Attendance, I need to save the dta files from each Excel file/sheet in corresponding folders, also named School 1, School 2, School 3, in .../Working Data/Attendance.

    I have code that works to loop through Excel files in a single directory, import each sheet and manipulate the data how I want it, and then save each sheet as a dta file in a single directory. What I can't figure out is how to loop through multiple directories, and then save to a separate set of multiple directories. Below is the code I have now; so far I have only tried looping through multiple directories to import the Excel files, but not saving to multiple directories. This code gives me an "invalid syntax" error. In the three nested/inner loops, the only thing I have changed from my working code is that I replaced "`source_path'/`f'" with "./`f'", but both give the same error. I have truncated the paths in the code below but they are correct in my actual code. The directories have the same names in the source folder and in the destination folder so I think I only need one local macro for those folder names.
    Code:
    * Set up source path for directories with Excel files with attendance data
    local source_path "C:/Users/.../Data/Raw Data/Attendance"
    
    * Destination path for saving dta files
    local dta_path "C:/Users/.../Data/Working Data/Attendance"
    
    cd `"`source_path'"'
    
    * Create macro with the list of the folders with raw data and with working data 
    local folder_list `" "School 1" "School 2" "School3" "'
    
    foreach folder of local folder_list {
        cd `source_path'/`folder'
        local files: dir "." files "*.xlsx"
    
        foreach f of local files {  // LOOP OVER EXCEL FILES & CAPTURE THE NAMES OF THE TABS IN THE FILE
            display "./`f'"
            import excel using "./`f'", describe
            local n_sheets `r(N_worksheet)'
            forvalues i = 1/`n_sheets' {
                local tab`i' `r(worksheet_`i')'
            }
        
            //  NOW LOOP OVER THE TABS, IMPORTING THEM ONE AT A TIME
            forvalues i = 1/`n_sheets' {
                if strpos("`tab`i''", "October") == 0 { // EXCLUDE THE OCTOBER SHEET - no schools started before November this year
                    display "./`f'"
                    import excel using "./`f'", sheet("`tab`i''") cellrange(A2) allstring firstrow clear
                    gen school = subinstr("`f'",".xlsx","",.) // IDENTIFY WHERE DATA CAME FROM
                    gen month = "`tab`i''"
                    rename Session*_??? Session* // HARMONIZE VARNAMES ACROSS TABS
                    drop if StudentLastName == "0" | StudentLastName == "" // DROP EXTRANEOUS OBSERVATIONS
                    
                    //Save each sheet as a dta file
                    local filename = subinstr("`f'",".xlsx","",.)
                    save "`dta_path'/`filename'_`tab`i''.dta", replace
                }
            }
        }
    
    }
    I'm not sure if this is on the right track - if it's just a small error in my syntax or if I have bigger issues with my code. I haven't been able to find good examples of saving to multiple directories within a loop, so I'm not sure where to start with that part.

    After this step, I then need to merge the dta files in each directory (i.e. merge all the data from all sheets from one single Excel file) and save that merged data as a single dta file, and then loop through the dta file directories again to append all of the merged dta files. I am doing the merging and appending as separate steps because the raw data is quite messy and I will need to check for incompatibilities with -precombine- before combining anything.

    I've seen -filelist- suggested as a way to simplify the task of looping through directories, but despite reading the help and some posts on it, I don't quite understand how it works. I also have some directories in the parent folder that I do *not* want to loop through, and I'm not sure if filelist lets you exclude directories.

  • #2
    I do not see where the error is coming from. Perhaps somebody else can and will respond soon. But, if not, I suggest adding
    Code:
    set tracedepth 1
    set trace on
    just before whichever loop is the one that produces the error message. That way you will be able to identify which command is causing the problem, and then that command can be scrutinized and fixed.

    Comment


    • #3
      Clyde Schechter Thank you for the suggestion of using trace, I was able to figure out the issue. cd `source_path'/`folder' needed a pair of double quotes around it - changing to cd "`source_path'/`folder'" fixed it.

      So, I now have working code that loops through the source folders and saves to a single directory. How can I get this to save the dta files from each Excel file into separate directories?

      Comment


      • #4
        So, I now have working code that loops through the source folders and saves to a single directory. How can I get this to save the dta files from each Excel file into separate directories?
        Well, it depends on what the names of those directories are. If they have the same names as the directories from which the Excel files that sourced their data (but are located in `dta_path' instead of `source_path', then it's easy enough. Just change that -save- command to:
        Code:
        save "`dta_path'/`folder'/`filename'_`tab`i''.dta", replace
        Note: Make sure that those folders already exist in `dta_path' before you do this. Create them if they don't.

        If the folder names are different, then it's more complicated and requires either a rule that enables computation of the correct destination folder from the available information, or if it is very complicated and arbitrary, some kind of separate data set that crosswalks the filenames to the destination directory paths.

        Comment


        • #5
          In this case the destination directory names are the same as the source directory names, so your suggestion is exactly what I needed. Updated my code and it works as expected. Thank you!

          Comment

          Working...
          X