Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • --collect export-- takes a really long time

    Hi,
    I like the idea of the new -collect- functions, but in my first application it take >5 minutes to export a single table. During this time the computer fan runs at a very high rate. In Task Manager, Stata is running >30% of CPU and 200MB of memory even though the table is ultimately something like 50 KB. (KB, not MB).
    In Windows Explorer, the new file keeps appearing to grow in size up to ~30 KB, then flip to 0 KB in size, then grow again, etc.
    Here is a toy example that replicates the behavior. Admittedly it is a moderately sized, sparse table. But this is the nature of the data. It strikes me as a little odd that the table itself is pretty quick but the -collect export- takes forever, and the behavior of the file size when you watch the directory also seems odd.

    Code:
    clear all
    set obs 2500
    
    
    gen Ab = runiformint(1,35)
    gen Resistant = runiformint(0,1)
    gen Sp = runiformint(1,20)
    
    table Sp Ab , statistic(count Resistant) statistic(mean Resistant) nototal replace
    collect export  "test.xlsx", replace

    Thanks,
    Scott

  • #2
    collect export is using putexcel to produce the xlsx file. From the above example, this export is trying to populate (35+1)*(2+1)*(20+1) = 2,265 cells. Right now, each call to putexcel is changing the file on the disk instead of working with the file in memory.

    We will be adding option open to collect export for xls and xlxs files, so that the behind-the-scenes calls to putexcell will work with the file in memory, and thus speed things up.

    This new option will be available in the next update to Stata 17.

    FYI a quick timing on a relatively recent computer puts the current runtime at 13 minutes, but with the open option it took 19 seconds.

    As a short-term work-around, use option dofile() to save the putexcel commands to disk, press the break key after collect export has been running for about 10 seconds, then edit the do-file by adding option open to the call to putexcel set. Make sure to remove the xlxs file (or add option replace) before running the do-file.

    We are sorry for the inconvenience (and the ugly work-around).

    Comment


    • #3
      Thanks Jeff!
      I look forward to the update, and to figuring out how to use the -collect- commands better, I think they will be a useful addition.
      Best,
      Scott

      Comment

      Working...
      X