Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Any tools for measuring Stata performance (e.g., CPU time)?

    I'd like to do some simple comparisons of Stata in different environments (Windows desktop, virtual Windows, unix server). Other than the timer command, I can't find any tools for measuring performance statistics.

    I'm looking for something similar to the SAS fullstimer option that reports among other things "real time", "system cpu time", and "user cpu time".

    I found a FAQ that answers my exact question (see link below) but my understanding is that the answer not correct. Before going further, am I correct in understanding that the Stata -timer- command measures real time and not CPU time?

    Q:Is there a way to tell how much CPU time Stata takes to run a particular command or do-file?

    I've noticed that a Stata job (measured using -timer-) runs quicker on my cheap desktop than our expensive virtual windows server and would like to get a better idea where the difference is.

    My motivation is two-fold, firstly in deciding where to invest and secondly helping us optimise the VDI.

    Traditionally, most of our users run Stata on a standalone desktop and if more resources were required then a unix server was available. With improvements in desktop computing we are finding that even a relatively modest desktop suffices for most users (even those with datasets with millions of observations). We have also introduced virtual Windows (VMware Horizon View) giving users another alternative to running larger Stata jobs.

    We are currently planning the next upgrade to our unix servers and I have the role of representing the "Stata users". I am leaning towards suggesting that we push the Stata users towards Windows solutions (either the VDI or desktops) and optimising the unix servers for the people who really need it (e.g., those working with GWAS and sequencing data). We could, for example, invest in upgrading our VDI hardware or invest in more RAM for standalone desktops rather than upgrading the unix server currently optimised for Stata.

    I've read what I can find (e.g., see below) but I'd like to do some simple real-world tests of performance across the three main environments we have here.

    http://blog.stata.com/category/performance/

    http://www.statalist.org/forums/foru...vs-clock-speed

    http://www.statalist.org/forums/foru...s-or-cpu-speed

  • #2
    Paul,

    The Stata timer command does indeed, as far as I can tell, measure real time, not CPU time. That said, because Stata does almost everything using data in memory, these two are probably quite close to each other, except for any operations that involve reading/writing data to/from disk. To get at this difference you could benchmark I/O operations separately from actual data manipulation and analysis, using multiple timer commands scattered throughout a do-file.

    One thing to be aware of when benchmarking read operations: Microsoft Windows keeps track of what files are read into what memory locations, so if you do two consecutive reads of the same disk file without removing the file from memory, the second read will be considerably shorter than the first because it will use the data already in memory.

    Another thing to be careful of when benchmarking between different systems is that reading a data set into memory will be slower if there is less memory available. I have observed this anecdotally, but have not quantified it exactly, so I don't know whether the absolute amount of memory remaining or the relative amount is more important. If possible, the best way to compare between two systems is to start with as much memory as possible (i.e., nothing but OS memory usage).

    Finally, another command that might be useful for isolating the source of any perceived bottlenecks is profiler, which gives you the time for each component of a command.

    Regards,
    Joe

    Comment


    • #3
      I would definitely recommend using the profiler command for this. Along the same lines as Joe, another thing that makes the comparison difficult are fundamental differences between the Windoze and *nix-based kernels. I constantly have issues with Windoze trying to cache random junk in memory, and having no way to purge the RAM cache, which leads to the system swapping memory (e.g., storing data that should reside in RAM on the disk because of a lack of capacity). If you're working in a virtualized environment, one thing to consider (which would not be possible from within Stata) is the overall load on the system. A sys admin where I used to work told me a few times about watching all of the VMs on a box vanish from that server and reappear on a different server when I would spin up more intensive processes (they had things set up that way to help automating the load balancing).

      Comment


      • #4
        Originally posted by wbuchanan View Post
        I constantly have issues with Windoze trying to cache random junk in memory, and having no way to purge the RAM cache, which leads to the system swapping memory (e.g., storing data that should reside in RAM on the disk because of a lack of capacity).
        Which version(s) of Windows are you referring to?

        Comment


        • #5
          Friedrich Huebler XP and Windoze 7 are what I've had to work on previously. What I really want from Windoze is to be able to do something like:

          Code:
          sudo purge

          Comment


          • #6
            Thanks very much for your responses. This process has been very informative for me. I compared the timing of two Stata jobs on various systems here and attach a summary of the results ( stata_test.pdf ). The first job used estimation commands and I expected, and observed, an improvement with Stata/MP. The second job did not use any estimation commands but I was interested to see that MP was 10% slower than IC (I expected them to be similar).

            I discovered what is actually well documented, that `Assuming you have enough RAM, [and a 64-bit OS to utilise it] the next greatest effect on the performance of Stata is the
            processor. The faster the clock speed and the more cache a processor has, the faster Stata will run.'

            http://www.stata.com/support/faqs/wi...-requirements/

            I found quite a few threads along the lines "Why does my old laptop run Stata faster than my expensive new server?" so clearly I was not the only person in the dark about this.

            Comment


            • #7
              Perhaps you have encountered this thread, but in case you have not: A while back I posted here with a performance problem (IC vs. MP2) that turned out to come from my having a larger -set matsize- value on the machine running MP2. The lesson was to make sure that one's -set matsize- value was small unless there's a reason to do otherwise. I wonder if this might be an issue in your situation.

              See:
              http://www.statalist.org/forums/foru...mp2-vs-v-12-ic

              Comment


              • #8
                Thanks Mike. Interesting! I was not aware of that.

                matsize was 400 on both IC and MP4 and the other memory settings were the same.

                Are these settings reasonable? We have 264Gb RAM available.

                Code:
                // Stata/IC
                . query memory
                ------------------------------------------------------------------------------
                    Memory settings
                      set maxvar           2048       (not settable in this version of Stata)
                      set matsize          400        10-800; max. # vars in models
                      set niceness         5          0-10
                      set min_memory       0          0-30g
                      set max_memory       30g        32m-1600g or .
                      set segmentsize      32m        1m-32g
                
                // Stata/MP4
                . query memory
                --------------------------------------------------------------------------
                    Memory settings
                      set maxvar           5000       2048-32767; max. vars allowed
                      set matsize          400        10-11000; max. # vars in models
                      set niceness         5          0-10
                      set min_memory       0          0-30g
                      set max_memory       30g        32m-1600g or .
                      set segmentsize      32m        1m-32g
                It may have been obvious to anyone familiar with hardware, but I was looking into this because we are in the process of upgrading our server. Most of our CPU-intensive stuff (e.g., omics) is done in R and the majority of our Stata users are doing typical register-based epidemiology. We've taken the decision to provide Stata users with more RAM on their desktop PCs so that fewer of them have a need to use another system. This is also motivated by the fact that most of our Stata users are more comfortable in Windows than unix and eliminates the need to copy files to a different system. It's comforting for the Stata users to know that they are not necessarily missing out by not using the expensive server.

                Comment


                • #9
                  This mystery of slower on the server than the desktop sounds like one for Tech Support to me. They were quite forthcoming with my IC vs. MP2 problem. - Mike

                  Comment


                  • #10
                    Originally posted by Mike Lacy View Post
                    This mystery of slower on the server than the desktop sounds like one for Tech Support to me. They were quite forthcoming with my IC vs. MP2 problem. - Mike
                    I am also a fan of StataCorp's tech support but I believe this is explainable. The job runs quicker on the desktop because I am not using the resources that make the server more expensive. Even with MP4, I was only using 4 of the available 40 cores and very little of the available RAM. My i5 was running at 3.3 GHz (3.7 GHz turbo) whereas the server CPU clockspeed was 2.3GHz. My understanding is that it is the difference in processor clockspeed and cache that explain most of the differences. This was an important message to our users; "you are not going to benefit significantly from the server, compared to your desktop, unless you have a job that benefits from parallel processing or needs more RAM than available on your desktop".

                    For what it's worth, the server had only a very light load while I was running these tests so that is not the explanation.

                    Comment


                    • #11
                      Paul Dickman something else that Alan Riley (StataCorp) and Bill Gould (StataCorp) mentioned to me a couple of years ago has to do with Unix systems running with the SPARC architecture. I had a Solaris VM built out where I worked (which was ran on SPARC processors) and noticed a significant drag on performance. I had equivalent amounts of memory and processors available (although my MacBook Pro may have had slightly faster CPUs), but there was a massive difference in the time it took to run jobs in that environment. After that, I got the sys admin to give me an Ubuntu VM running on different hardware and there was a difference in performance. I don't have access to that infrastructure any more, but that could also be something worth considering. When I've used Stata on different platforms, it seems - analogously of course - that the Windows environment tends to be much slower, which I think is more due to the amount of overhead that Windows consumes to operate in general.

                      Comment

                      Working...
                      X