Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • understanding Stata bottlenecks

    Dear all,

    I use a large dataset (10 GB) and I would like to understand a little bit more where are located the possible (Stata and hardware) bottlenecks that can affect the Stata processing speed of my data.

    Specifically, I have quite a poweful machine (i7 4770, 36 GB RAM DDR3) and Stata MP4. Still,a 10GB dataset requires some time to load into the RAM and having a SSD disk does not seem to improve those loading times (around a minute or so, sometimes more)

    Could someone explain to me how can I decrease those loading times? Is this related to Stata or related to some bottlenecks in the disk-to-RAM max transfer rate?

    Many thanks!

    EDIT: I have came across this post http://www.statalist.org/forums/foru...a-and-hardware
    Still, I do not see how to improve those long, frustrating loading times. Is the ECC RAM of any help?
    Last edited by Jean-Luc Morin-Chesnel; 24 Oct 2014, 08:04.

  • #2
    Your ECC RAM may actually be slowing things down a bit; the overhead of processing the error corrections would have a negative impact on performance. However, from what I've read, that impact is tiny (maybe 10% at worst). Your machine seems about as fast as you can get, though a server-grade processor might help a bit (though not by much, and at a huge cost). One cheap trick you might try is to turn off your antivirus temporarily, or, if you have the option, tell it not to scan .dta files. I have no idea how much or little this might help, but it's about all I can think of.

    Oh, and as an extreme move, you don't mention what OS. I haven't heard of any major differences across OSs with Stata, though I have ran across certain other programs or procedures with those programs that run much faster with Linux. If hardware really is the bottleneck, obviously it won't help, but if you're running Windows, you might try a "Live DVD" of Linux, that is, a bootable DVD that leaves the Windows installation intact. Doesn't cost anything to try. You would, of course, need to install and update Stata into the new environment, but as far as I know, most Live DVDs allow you to save the state of your Linux environment to the hard drive upon exit, so you wouldn't have to do it every time. Licensing should be fine.

    As a final thought, do you really need all 10 GB? If it's really wide data, ask yourself if you really need all those variables. If it's really long, then maybe consider sampling to get it down to manageable size. Some forms of data (geo-spatial, for example), sampling wouldn't be feasible, but for most datasets, you might run everything with 1 GB worth of data, then only use all 10 GB for your final runs.

    Oh, and 36 GB sounds like a lot, but depending on the models, the matrices in the background could chew up the rest of your memory in a hurry.

    Sorry no silver bullet, but a few thoughts, one or more of which might help to some degree.

    Comment


    • #3
      Oh, and as a random thought -- check what drivers and firmware you are using for the SSD, and the BIOS version. Back in the early days, I ran across cases where SSD was actually slower that a regular hard drive. In general, updating the drivers and firmware made a dramatic improvement. In a couple of cases, updating them slowed things down. Hasn't seemed to be an issue in the past couple of years, but could be worth checking.
      Last edited by ben earnhart; 24 Oct 2014, 09:35.

      Comment


      • #4
        thank you ben!!
        yes i do need all the 10gb and I have a windows machine.

        Comment


        • #5
          Jean-Luc,

          What SSD do you have? I just loaded a 10gb dataset (650mm obs, 8 vars) in 20 secs using a Samsung 840 SSD, so I'm surprised about your 1 minute time.
          Also, if you haven't done so, converting doubles to float is sometimes useful and only in extreme cases this change of precision make any difference.

          Comment

          Working...
          X