Appending excel files considering sheet names

Nomin Byambaa

Join Date: May 2018

Posts: 8
#1

Appending excel files considering sheet names

17 Jun 2025, 00:34

Hi,

I have attendance data for many schools, stored in separate Excel files. Each file contains multiple sheets corresponding to different months—for example, data for February 2022 is stored in a sheet named "02_2022". Since the number of days varies by month, the number of columns also differs. For instance, January sheets span from column A to column AP, while April sheets go from column A to column AO. I’d like to use the sheet names to identify and handle these differences in column count.

In addition, some schools don’t have attendance data and instead contain a placeholder or dummy sheet, often named simply "M". These sheets also go up to column AP. I want to include these dummy sheets in the final dataset as well, appending them alongside the rest.

Currently I am using the following code. It is working but taking a long time (3 hours) to process my entire list of excels, I am hoping the approach above is more efficient. Feel free to share your thoughts on this as well.

Code:

forvalues s = 1/`nschool' { local eno = `eiin`s'' /** Get sheet info **/ import excel using "***********directory to excel files**************", describe local nsheet = r(N_worksheet) forvalues i = 1/`nsheet' { local sheetname`i' = r(worksheet_`i') } forvalues i = 1/`nsheet' { clear local imported = 0 // Try column ranges from AP (42) down to AM (39) foreach col in AP AO AN AM { capture { import excel using "**********directory to excel files**********", /// sheet("`sheetname`i''") cellrange(A1:`col'2) firstrow allstring } if _rc == 0 { local imported = 1 continue, break } } if `imported' == 0 { display as error "‼️ EIIN `eno', sheet `sheetname`i'' could not be read (39–42 cols failed)." continue } if (`i' > 1 | `s' > 1) { append using `tmp' } save `tmp', replace } }
Tags: None
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1387
#2

17 Jun 2025, 06:09

It is easy to obtain the exact number of days (and thus the name of the last column) for a sheet with a name like "02_2022". Consider:

Code:

. dis daysinmonth(date("02_2022","MY")) 28 . dis "`:word `=daysinmonth(date("02_2022","MY"))-27' of AM AN AO AP'" AM

Or the more reader- and programmer-friendly version:

Code:

local num_days = daysinmonth(date("02_2022","MY")) local last_columns AM AN AO AP local last_col: word `=`num_days'-27' of `last_columns'

That said, I am not sure if all this work is even needed. Why do you necessarily need to specify the cellrange option to import excel at all? Let Stata import the sheet; you can then decide to

Code:

keep in 1

to just keep the second row (i.e. one observation after the variable names). The append command should allow you to append sheets with different numbers of columns without throwing errors, especially if all the variables are numeric (empty columns will by default be read as numeric, so if the same column in other sheets is actually a string, this may need a little bit of extra code) -- all the more so if you can start the process off with a sheet that has the maximal set of columns.

Last edited by Hemanshu Kumar; 17 Jun 2025, 06:15.
1 like
Comment

eric_a_booth

Join Date: Apr 2014
Posts: 292

17 Jun 2025, 07:21

Nomin Byambaa I don't think you need the cell range and if that's right you can simplify your loop to import and combine your data without respect to the varying # of days per month (or any errors in data input of the sheet name that would signal the # of days in a month).
The example loop below creates fake files and then loops over them and combines them. Notice that even though the ranges & number of cols varies and the names of the sheets all change, the loop can use the saved/background elements of -import excel, describe- to flexibly loop over the files with something like

Code:

        import excel using `"${myfolder}/`f'"', sheet(`"`r(worksheet_`n')'"') clear

in a full working example this might look like:

Code:

**create fake data**
global myfolder `"/folderpath/here//"'
cap mkdir `"${myfolder}"'


forval n = 1/50 { //create files with various names and sizes
clear
set obs 10000
    forval var = 1/150 {
    if `=rbinomial(1,`=runiform()')'==1  g v`var' = runiform()   //random addit. cols for example
} //end var loop
    export excel using `"${myfolder}/file`n'.xlsx"', sheet(02_`n') replace
    export excel using `"${myfolder}/file`n'.xlsx"', sheetreplace sheet(03_`n')  //multiple sheets per file
    
} //end file n loop

ls ${myfolder}




**read in all files and sheets in the folder, no range needed**
clear
sa `"${myfolder}/master.dta"', replace emptyok //if you want to append them during this process!
**
global files: dir `"${myfolder}"' file "*.xlsx", respectcase nofail
di `"${files}"'
**
foreach f of global files {
    di `"`f'"'
    import excel using  `"${myfolder}/`f'"', describe
    forval n = 1/`r(N_worksheet)' {
        di `"sheet: `n' / range: `r(range_`n')' "' //helps with diagnosing issues
       import excel using `"${myfolder}/`f'"', sheet(`"`r(worksheet_`n')'"') clear  
        g file = `"`f'"'
        g date = `"`n'"'
         **if yiou want to append them:
         append using `"${myfolder}/master.dta"'
         sa `"${myfolder}/master.dta"', replace
    } //end n loop
} //end f loop



u `"${myfolder}/master.dta"', clear
ta file

Last edited by eric_a_booth; 17 Jun 2025, 07:23.

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX

Comment

Nomin Byambaa

Join Date: May 2018

Posts: 8
#4

17 Jun 2025, 10:41

eric_a_booth Hemanshu Kumar thanks a lot for replies. When I do not define the cell range, STATA captures bunch of empty columns as variables, which pushes me over the limit of number of variables.
Comment
eric_a_booth

Join Date: Apr 2014

Posts: 292
#5

17 Jun 2025, 10:52

Nomin Byambaa So I guess it's capturing surplus blank columns that might be an after effect of some kind of output from the system? I would probably figure out the maximum range for actual data and specify that range in the option (some maximum that is larger than you'd ever expect to receive (with padding) but smaller than your Stata version's limits) . Then add a command for -dropmiss- *(from SSC) to drop empty columns after importing.

Eric A. Booth | Senior Director of Research | Far Harbor | Austin TX
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#6

17 Jun 2025, 10:54

When I do not define the cell range, STATA captures bunch of empty columns as variables, which pushes me over the limit of number of variables.

Really? Even if you are using Stata BE, which is the smallest flavor of Stata, you can have up to 2,048 variables. With columns out as far as AP you are only at 42 variables. Even assuming you have additional variables for each school, it is hard for me to believe that you will get up anywhere close to 2,048 variables. So either I'm misunderstanding something, or something else is wrong that has not been discussed, and more information about this data is needed to solve the problem.
1 like
Comment

Announcement