Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -putexcel , open- conflict with -distinct-

    I have only recently started using putexcel.

    First, a rant. The examples in the Stata help don't exactly guide you towards using the -open- option which makes Stata work with the Excel file in memory, rather than on disk. And this makes a world of a difference. Until I discovered this option, I was ready to completely give up on putexcel for the amount of time it was taking -- I was trying to write a set of tables of almost entirely percentage numbers, which altogether fill 241 rows and 24 columns -- not a large amount of data at all. The final excel file I create is a mere 128KB on disk. And the Stata data file I start with is about 17MB.

    Yet doing this without the -
    open- option was taking over 20 minutes. Using the -open- option, the timer shows exactly 5.77 seconds (it would be even faster if I were to open and close the Excel just once; however, I am having to do this about 30 times, because I need to use the -distinct- command within a loop and I am only able to make it work if the distinct command is issued while the Excel file is not open for I/O; see below). And this is when I am on a MacBook Pro with M2 silicon, with 32 GB of memory and 1.2TB of free space on my 2TB SSD drive, running Stata MP/12-core. I say this because it has been speculated in this thread that large Excel files, or spinning hard disks, or other latency issues might be to blame for reports of the extremely slow performance of putexcel. These certainly do not apply to my case.

    Moving along, once I discovered the -open- option, I quickly ran into the "bug" that Sergiy pointed to in the same thread. Or rather, a variant of it.

    Consider this code, which works just fine on my machine (I have tested this on Stata 16 and Stata 18)


    Code:
    . clear
    
    . set obs 10
    number of observations (_N) was 0, now 10
    
    . gen byte x = _n
    
    . putexcel set myfile, open modify
    Note: file will be modified when putexcel save command is issued
    
    . putexcel A1 = x
    
    . putexcel save
    file myfile.xlsx saved
    and yet, when I introduce the -distinct- command (from SSC) , I get the error others also see:

    Code:
    . clear
    
    . set obs 10
    number of observations (_N) was 0, now 10
    
    . gen byte x = _n
    
    . putexcel set myfile, open modify
    Note: file will be modified when putexcel save command is issued
    
    . distinct x
    
    --------------------------
       |     total   distinct
    ---+----------------------
     x |        10         10
    --------------------------
    
    . putexcel A1 = x
    file handle not found
    r(604);
    -putexcel- then becomes unusable until Stata is restarted. Not shown here, but if I omit the -open- option in the code above, Stata produces no errors. Issuing the -distinct- command before the -putexcel, open- command again results in no errors.

    Given that others have reported this problem without any mention of the -distinct- command, I suspect the culprit is in -putexcel, open-. But I thought I would post this thread in case it helps debug and troubleshoot what is going on.
    Last edited by Hemanshu Kumar; 28 Aug 2023, 20:42.

  • #2
    I comment only on the use of distinct, which is to be cited as from the Stata Journal, please. It's recently been updated.

    Code:
    SJ-23-2 dm0042_4  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct, distinctgen if installed)  N. J. Cox and G. M. Longton
            Q2/23   SJ 23(2):595--596
            most important change is addition of distinctgen command
    
    SJ-20-4 dm0042_3  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q4/20   SJ 20(4):1028--1030
            sort() option has been added
    
    SJ-15-3 dm0042_2  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q3/15   SJ 15(3):899
            improved table format and display of large numbers of
            observations
    
    SJ-12-2 dm0042_1  . . . . . . . . . . . . . . . . Software update for distinct
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q2/12   SJ 12(2):352
            options added to restrict output to variables with a minimum
            or maximum of distinct values
    
    SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
            (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
            Q4/08   SJ 8(4):557--568
            shows how to answer questions about distinct observations
            from first principles; provides a convenience command
    The main point about distinct is to display a table. It is a r-class command and leaves in its wake results for the last variable considered, which isn't limiting here as it's processing a single variable.

    However, I am puzzled at what you're trying to do as you're trying to post an entire variable x using putexcel. You could have tried that without calling distinct at all; you're certainly ignoring its results.

    Comment


    • #3
      Nick Cox sorry for the mis-cite.

      I agree about the trivial use of distinct in the minimal (non) working example here. My actual use of it makes far more sense (the number of distinct values returned by the command is one of the entries in my table). My point was simply to show that any call to -distinct- , once the Excel file is in memory via -putexcel, open-, breaks a subsequent call to putexcel.

      Comment


      • #4
        Thanks for that. I don't think distinct is doing anything extraordinary here, but if it is, we will modify it accordingly.

        Comment


        • #5
          Thanks for alerting me to the update, Nick. For what it's worth, I just installed the update and the error above still replicates. Probably not surprising, because I think -distinct- itself has not been changed in this update:

          Code:
          . which distinct
          /Users/hemanshu/Library/Application Support/Stata/ado/plus/d/distinct.ado
          *! 1.4.1 NJC 10 October 2020          
          
          . which distinctgen
          /Users/hemanshu/Library/Application Support/Stata/ado/plus/d/distinctgen.ado
          *! 1.0.0 NJC 15 August 2022

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Thanks for that. I don't think distinct is doing anything extraordinary here, but if it is, we will modify it accordingly.
            There is a

            Code:
            mata mata clear
            in line 170.

            The Mata environment, which is also used by putexcel, is global in scope. It is probably not a good idea to simply wipe it inside distinct.

            Comment


            • #7
              Nick Cox I think I may have figured it out.

              The problem disappears if I comment out one line (#170) in distinct.ado :

              Code:
              mata mata clear
              I have zero experience of coding in mata, so you would be a better judge of whether clearing mata is overreach on the part of the distinct.ado file, or whether putexcel should be coded better so as not to be vulnerable to another program clearing mata.

              Edit: cross-posted with #6. Forever grateful to Daniel and this amazing community for engaging so deeply with other people's problems!
              Last edited by Hemanshu Kumar; 29 Aug 2023, 01:47.

              Comment


              • #8
                Thanks for the signals. We will attend to that in a future update.

                In fact, the ado file up at StataCorp is not quite the latest version, another glitch somewhere between me and StataCorp's server.

                Comment


                • #9
                  Files at the Stata Journal website are now up-to-date as compared with the Update in 23(2).

                  The fix to the problem raised in this thread will follow in a later Update.

                  Thanks again to those reporting and diagnosing these problems.

                  Comment

                  Working...
                  X