Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Efficiently reduce memory usage in STATA

    Hi all,


    I have a 22.56 GB .csv file which I have imported in STATA (after waiting a lot of time). I now would like to compress it and to reduce its memory usage.
    For the moment, after the import I just used compress command naively like this:

    Code:
    import delimited "/Users/federiconutarelli/Dropbox/PRIN Green Nutarelli/dataset/df_emakg.csv", clear
    *** SAVING MEMORY STEPS:
    compress //thi is to reduce the size of the database (it takes forever).
    but it is taking forever (it's the 3rd day in a row that stata is compressing strings with the output:

    Code:
    is strL now coalesced
    Now my question is whether there is a more rapid and possibly efficient way to compress the database (e.g. should I use recast first?). Please, notice that the .csv file comes from a pandas database that I managed to compress to 8GB. Probably when converting to .csv one of the formats messed up and this is why the .csv file ended up having 22.56 GB of memory usage.

    Thank you,

    Federico

  • #2
    A general approach is to import your dataset in pieces, either by importing a subset of variables over the full range of observations, or a subset of observations over the full range of variables. With import delimited, the latter is easier, which you can specify with the -in- qualifier. Perform whatever compression steps you want, then save it the chunk and repeat the process until you have read in all of your data. Then, use -append- to append all your data chunks together, reconstituting your full dataset.

    There is no guarantee that this will be any faster than what you have already, but it often is. The reason is that restricting your attention to one subset of data at a time means that Stata can perform all work in memory without having to cache to disk. The chunk size should be small enough that the resulting dataset can fit comfortably into your available RAM memory.

    Comment


    • #3
      Leonardo Guizzetti I see. Thank you for the suggestion. May I also ask you if STATA will stop running and restart from the point where it was if I close my PC? Thank you

      Comment


      • #4
        You’re welcome. Stata will not resume what it was doing if you stop Stata or shutdown your computer.

        Comment


        • #5
          in line with Leonardo's advice of working with chunks of data, check "CHUNKY: Stata module to chunk a large text file into smaller parts" https://ideas.repec.org/c/boc/bocode/s456994.html.

          Comment


          • #6
            Thanks to all.

            Comment

            Working...
            X