Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting up a desktop environment to freely analyze 200GB data using Stata

    I have 200GB dta files that I want to use freely and quickly. My Stata is 4-core Stata MP 15.1.

    My desktop has Intel core i-9, M.2 Internal SSD, Gigabyte X299 Gaming 7 motherboard, 64GB RAM. The analysis is still extremely slow.

    1. I want maximum speed. I am willing to pay more. What other things can I do?

    2. I also feel it's slower when my data are stored in external drive. Is this generally the case?

    3. Will having SSD help instead of HDD?

    Thank you!
    Last edited by James Park; 02 Nov 2018, 21:19.

  • #2
    I'm not a hardware expert, but Stata likes to have the data set fully in memory. So having 64GB of RAM is not a good environment for working with 200GB data files. I would get lots more RAM.

    Comment


    • #3
      Stata loads the entire data set into memory, and it is recommended that your computer contain 50% more memory than the size of your largest dataset. https://www.stata.com/support/faqs/w...-requirements/

      Before buying a server with 400GB RAM: Your 200GB file might have more variables than needed and/or be organized inefficiently. Can you read parts of the file, "recode/recast", aggregate, sample, then combine the parts?

      Code:
      describe using bigfile
      * read some subset of the file
      use x1-x10 x100 in 1/10000 using bigfile
      compress
      * ...
      Take a look at the following:

      https://gtools.readthedocs.io/en/latest/index.html
      https://github.com/sergiocorreia/ftools/#introduction
      https://www.stata.com/meeting/portug..._guimaraes.pdf

      https://www.stata.com/statamp/statamp.pdf
      Last edited by Bjarte Aagnes; 04 Nov 2018, 09:51.

      Comment

      Working...
      X