Dear All,
I am trying to optimize performance of my code that generates the output.
The simplified example of what I am having is the following code:
The first (gray) part of the code is data generation, which substitutes for real report-producing code. I am not sure how the content here might affect performance, but I think for illustration purposes it's fine as is here.
After an Excel file is produced with the -putexcel- statements, I reformat it using a Mata procedure formatcols, since this is the only way I found to adjust the columns widths (or is there a better way?).
I am concerned about the performance from this point onward. In the example it takes some 11 seconds per sheet to format the 10 columns. In the real project it takes well above 2 minutes per sheet. I can't imagine what is going on. For comparison, it takes a fraction of a second to extract all sheets from the XLSX file (unpack).
The only thing I've noticed is that the Excel file in the real report contains massive (above 44MB) "styles.xml" file which is repeating the same style definition over and over again. It is easily compressed though, so doesn't bloat the *.xlsx file much.
Is there a faster way to set the column widths in an Excel file from Stata?
Thank you, Sergiy Radyakin
I am trying to optimize performance of my code that generates the output.
The simplified example of what I am having is the following code:
Code:
clear all
version 17.0
set rmsg on
mata
void setcols() {
class xl scalar b
b = xl()
b.load_book(st_local("file"))
colwidths = tokens(st_local("colwidths"))
sheets = b.get_sheets()
for(i=1;i<=rows(sheets);i++) {
b.set_sheet(sheets[i])
st_global("c(current_time)") + " " + sheets[i]
for(j=1; j<=cols(colwidths); j++) {
b.set_column_width(j,j,strtoreal(colwidths[j]))
}
}
b.close_book()
}
end
program define formatcols
version 17.0
syntax , file(string) colwidths(string)
mata setcols()
end
local f "C:/temp/benchmark.xlsx"
capture erase "`f'"
forval s=1/20 {
putexcel set "`f'", open modify sheet("s`s'")
local cols="ABCDEFGHIJKLMNOPQRSTUVWZXYZ"
forval i=1/200 {
forval j=1/`=strlen("`cols'")' {
putexcel `=substr(`"`cols'"',`j',1)'`i' = "Test value"
}
}
putexcel save
}
formatcols, file("`f'") colwidths("3.82 31.27 37.55 30.91 15.64 18.36 26.36 40.91 65.18 35.82")
// END OF FILE
The first (gray) part of the code is data generation, which substitutes for real report-producing code. I am not sure how the content here might affect performance, but I think for illustration purposes it's fine as is here.
After an Excel file is produced with the -putexcel- statements, I reformat it using a Mata procedure formatcols, since this is the only way I found to adjust the columns widths (or is there a better way?).
I am concerned about the performance from this point onward. In the example it takes some 11 seconds per sheet to format the 10 columns. In the real project it takes well above 2 minutes per sheet. I can't imagine what is going on. For comparison, it takes a fraction of a second to extract all sheets from the XLSX file (unpack).
Code:
. formatcols, file("`f'") colwidths("3.82 31.27 37.55 30.91 15.64 18.36 26.36 40.91 65.18 35.82")
12:32:52 s1
12:33:03 s2
12:33:14 s3
12:33:25 s4
12:33:35 s5
12:33:46 s6
12:33:57 s7
12:34:07 s8
12:34:18 s9
12:34:29 s10
12:34:40 s11
12:34:51 s12
12:35:01 s13
12:35:12 s14
12:35:23 s15
12:35:34 s16
12:35:44 s17
12:35:55 s18
12:36:06 s19
12:36:17 s20
r; t=216.41 12:36:27
Is there a faster way to set the column widths in an Excel file from Stata?
Thank you, Sergiy Radyakin

Comment