I have only recently started using putexcel.
First, a rant. The examples in the Stata help don't exactly guide you towards using the -open- option which makes Stata work with the Excel file in memory, rather than on disk. And this makes a world of a difference. Until I discovered this option, I was ready to completely give up on putexcel for the amount of time it was taking -- I was trying to write a set of tables of almost entirely percentage numbers, which altogether fill 241 rows and 24 columns -- not a large amount of data at all. The final excel file I create is a mere 128KB on disk. And the Stata data file I start with is about 17MB.
Yet doing this without the -open- option was taking over 20 minutes. Using the -open- option, the timer shows exactly 5.77 seconds (it would be even faster if I were to open and close the Excel just once; however, I am having to do this about 30 times, because I need to use the -distinct- command within a loop and I am only able to make it work if the distinct command is issued while the Excel file is not open for I/O; see below). And this is when I am on a MacBook Pro with M2 silicon, with 32 GB of memory and 1.2TB of free space on my 2TB SSD drive, running Stata MP/12-core. I say this because it has been speculated in this thread that large Excel files, or spinning hard disks, or other latency issues might be to blame for reports of the extremely slow performance of putexcel. These certainly do not apply to my case.
Moving along, once I discovered the -open- option, I quickly ran into the "bug" that Sergiy pointed to in the same thread. Or rather, a variant of it.
Consider this code, which works just fine on my machine (I have tested this on Stata 16 and Stata 18)
and yet, when I introduce the -distinct- command (from SSC) , I get the error others also see:
-putexcel- then becomes unusable until Stata is restarted. Not shown here, but if I omit the -open- option in the code above, Stata produces no errors. Issuing the -distinct- command before the -putexcel, open- command again results in no errors.
Given that others have reported this problem without any mention of the -distinct- command, I suspect the culprit is in -putexcel, open-. But I thought I would post this thread in case it helps debug and troubleshoot what is going on.
First, a rant. The examples in the Stata help don't exactly guide you towards using the -open- option which makes Stata work with the Excel file in memory, rather than on disk. And this makes a world of a difference. Until I discovered this option, I was ready to completely give up on putexcel for the amount of time it was taking -- I was trying to write a set of tables of almost entirely percentage numbers, which altogether fill 241 rows and 24 columns -- not a large amount of data at all. The final excel file I create is a mere 128KB on disk. And the Stata data file I start with is about 17MB.
Yet doing this without the -open- option was taking over 20 minutes. Using the -open- option, the timer shows exactly 5.77 seconds (it would be even faster if I were to open and close the Excel just once; however, I am having to do this about 30 times, because I need to use the -distinct- command within a loop and I am only able to make it work if the distinct command is issued while the Excel file is not open for I/O; see below). And this is when I am on a MacBook Pro with M2 silicon, with 32 GB of memory and 1.2TB of free space on my 2TB SSD drive, running Stata MP/12-core. I say this because it has been speculated in this thread that large Excel files, or spinning hard disks, or other latency issues might be to blame for reports of the extremely slow performance of putexcel. These certainly do not apply to my case.
Moving along, once I discovered the -open- option, I quickly ran into the "bug" that Sergiy pointed to in the same thread. Or rather, a variant of it.
Consider this code, which works just fine on my machine (I have tested this on Stata 16 and Stata 18)
Code:
. clear . set obs 10 number of observations (_N) was 0, now 10 . gen byte x = _n . putexcel set myfile, open modify Note: file will be modified when putexcel save command is issued . putexcel A1 = x . putexcel save file myfile.xlsx saved
Code:
. clear
. set obs 10
number of observations (_N) was 0, now 10
. gen byte x = _n
. putexcel set myfile, open modify
Note: file will be modified when putexcel save command is issued
. distinct x
--------------------------
| total distinct
---+----------------------
x | 10 10
--------------------------
. putexcel A1 = x
file handle not found
r(604);
Given that others have reported this problem without any mention of the -distinct- command, I suspect the culprit is in -putexcel, open-. But I thought I would post this thread in case it helps debug and troubleshoot what is going on.
Comment