Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Slow exporting of excel sheets

    Dear Statalist,

    I am exporting a large number of excel sheets into a master excel document using the excel export function.I do something like this:

    forval i=1/20 {
    export excel varlist using masterdoc.xlsx, sheet(worksheet`i.xlsx, modify) first(var)
    }

    It works well and I like this option because I get to hide the worksheets and therefore I can keep them in the masterdoc (no need to bother about links and stuffs).

    Yet, my master doc is about 20 mo and I need to export about 20 excel sheets. The process takes about 45 minutes in total.

    I was hoping I could speed up the process using csv or binary excel file for my worksheets instead of xlsx but it looks like export excel does not accept these formats. Would you have a suggestion on how to speed up the process? I need to do the same procedure for 8 different masterdoc.xlsx which in total would take me a full 8 hours...

    Thanks

  • #2
    If you want to export to .csv format you should be using -export delimited-, not -export excel-. I'd be surprised if it's much faster, or even faster at all, but you can give it a try.

    Comment


    • #3
      The xlsx file format is not optimal for very large files: it's a zip file containing data as xml files, which can get huge, and take much time to process: xml is a kind of hugely verbose tag-based text file, useful for storing arbitrary data structures, but quite poor for rectangular datasets.
      When I deal with large files with Excel, I save in xlsb format, which is a binary format, and Excel is much faster and produces smaller files.

      However, it seems Stata can's export as xlsb. Actually, Stata uses Apache POI for Excel I/O, and apparently this library can't write xlsb.

      If it's feasible (mostly, if the number of rows and columns is not too large), you may export as xls, a binary format used by Excel prior to Excel 2007. And when you are done, do the conversion from Excel (to xlsb). A quick test with Stata seems to show that exporting to xls is much faster.
      Last edited by Jean-Claude Arbaut; 20 Feb 2020, 12:45.

      Comment


      • #4
        Thanks for your help. Clyde the export delimited command does not seem to have the "sheet" option I am looking for.

        Jean claude, xls slightly improve the speed. Thanks for the suggestion. But I can't transfer my worksheets to xlsb because I would like them to be exported as "sheet"to a xls document. It looks like, only export excel has the sheet option I am looking for... Would be very neat if Stata would update the export excel command with xlsb format

        Comment


        • #5
          Another issue I am running into using xls file. When I do the following:

          export excel varlist using "doc.xls", sheet(x, replace)

          I get the message:

          file doc.xls could not be loaded r(603).

          What is weird is that the excel document opens up normally, the path is correct and I can do the following:

          export excel varlist using "x.xls", sheet(x, replace)

          What seems to be the problem here?





          Comment


          • #6
            Another issue I am running into using xls file. When I do the following:

            export excel varlist using "doc.xls", sheet(x, replace)

            I get the message:

            file doc.xls could not be loaded r(603).

            What is weird is that the excel document opens up normally, the path is correct and I can do the following:

            export excel varlist using "x.xls", sheet(x, replace)

            What seems to be the problem here?





            Comment


            • #7
              Correction: Stata does not use Apache POI. I had it all mixed up: actually, one can see in the -import delimited- documentation that there is a reference to Java. And one can check by renaming the jar directory that without it, -import delimited- fails.

              However, Apache POI is not part of the jar files included with Stata, and by the same method one can see that Stata does not use Java at all to export as Excel (either xls or xlsx). I didn't check with Mata though.

              My apologies for this mistake.

              Note: the R package xlsx does use Apache POI, maybe it's the origin of my confusion.
              Last edited by Jean-Claude Arbaut; 22 Feb 2020, 03:26.

              Comment


              • #8
                A thousand thanks, @Jean-Claude, for the tip about switching from .xlsx to .xls. That dropped my runtime from 20 minutes down to 4 minutes on a Stata 17 program that uses collect export (i.e. putexcel) to write small tables to 43 worksheet tabs - or I might say from "competely unreasonable" to "fairly reaonable". You made my day.

                Comment


                • #9
                  Hi,

                  I would like to add that xls files are much more stable too. I have been trying to use export excel with the sheet option with xlsx document : it freezes stata and corrupt the files after a few iterations.With xls all works pretty well

                  Comment

                  Working...
                  X