Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • profiling commands, esp. for memory

    Hi,
    I have issues with my code executing slowly, and freezing at unexpected points. I also see extraordinary RAM usage at some points (though still not exhausting all the system has, so maybe not a cause for the freeze).

    Is there a reasonable way to profile Stata ado files? I have two problems with -profiler-:

    1. It only times commands' execution, does nothing else. E.g. you don't see memory use, unless it has an indirect effect on runtime.

    2. I see no logic in what level you see the runtime disaggregated. I definitely see some commands like -twoway- in fine detail down to multiple commands it calls. On the other hand, I don't see many other commands my ado file calls, e.g. lines of -egen-. Or you can't see inside loops because those are compiled? That's a bit confusing, as programs (subroutines) are also compiled, no?

    This is becoming a critical issue, as I would like to hold most of my data in memory (Stata is not that efficient in merging on certain variables only if needed, not to mention preserve-restore cycles.)

    As my data can double (from 25 GB to 50 GB) even if I run simple things on only a few variables, it would be great to see where that happens. But also on runtime, I expect unpleasant surprises. E.g. I just learnt that -twoway__scatteri_serset- can run for five minutes on my data even though I would expect that immediate graphing command bears no relation to how much data there is.

    Thanks,

    Laszlo


  • #2
    I have learnt a lot by watching external memory monitors (e.g., gnome-system-monitor on Linux) -- you can see it start to use swap when about 90-95% of RAM is used, at which stage it grinds almost to a halt. You can get reports on memory use between commands using the memory command, but that won't help for transient memory use within commands.

    Comment


    • #3
      Thanks, Brendan. StataCorp tech support said something similar. The only problem is that if a command is running, not a do file, it is very hard to see what happened when memory use spiked. Which does matter a bit. Not only because I would rewrite the ado code, but also to see what's "hopeless," and what can be sped up by using less data or something. (E.g. if only I understood why the graphics engine has problems when the data is large, it does not have much to do with most variables, nor should it sort the data for more operations, though since I learnt that -serset- does it usually for everything but immediate twoway commands.)

      Comment


      • #4
        László,
        a while ago I made sysinfo available. With it you can programmatically access the memory allocation stats.
        http://radyakin.org/statalist/2013080201/sysinfo.htm
        Best, Sergiy

        Attached Files

        Comment


        • #5
          Thanks, Sergiy, this is very nice! That said, as my problem is within a run of a command (binscatter.ado, as on SSC), I am not sure how much I can use this, but maybe I could scatter some -sysinfo- calls across the ado code temporarily. Thanks again!

          Comment


          • #6
            I know this is old, but I stumbled upon this today so I'm leaving a hint to the profiler command. profiler does not measure memory, but timings for each command which might also be useful.

            Comment

            Working...
            X