Efficiently reduce memory usage in STATA

Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#1

Efficiently reduce memory usage in STATA

04 Jul 2022, 05:17

Hi all,

I have a 22.56 GB .csv file which I have imported in STATA (after waiting a lot of time). I now would like to compress it and to reduce its memory usage.
For the moment, after the import I just used compress command naively like this:

Code:

import delimited "/Users/federiconutarelli/Dropbox/PRIN Green Nutarelli/dataset/df_emakg.csv", clear *** SAVING MEMORY STEPS: compress //thi is to reduce the size of the database (it takes forever).

but it is taking forever (it's the 3rd day in a row that stata is compressing strings with the output:

Code:

is strL now coalesced

Now my question is whether there is a more rapid and possibly efficient way to compress the database (e.g. should I use recast first?). Please, notice that the .csv file comes from a pandas database that I managed to compress to 8GB. Probably when converting to .csv one of the formats messed up and this is why the .csv file ended up having 22.56 GB of memory usage.

Thank you,

Federico
Tags: data, memory, panel data, Suggestion
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#2

04 Jul 2022, 05:57

A general approach is to import your dataset in pieces, either by importing a subset of variables over the full range of observations, or a subset of observations over the full range of variables. With import delimited, the latter is easier, which you can specify with the -in- qualifier. Perform whatever compression steps you want, then save it the chunk and repeat the process until you have read in all of your data. Then, use -append- to append all your data chunks together, reconstituting your full dataset.

There is no guarantee that this will be any faster than what you have already, but it often is. The reason is that restricting your attention to one subset of data at a time means that Stata can perform all work in memory without having to cache to disk. The chunk size should be small enough that the resulting dataset can fit comfortably into your available RAM memory.
3 likes
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#3

04 Jul 2022, 10:23

Leonardo Guizzetti I see. Thank you for the suggestion. May I also ask you if STATA will stop running and restart from the point where it was if I close my PC? Thank you
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#4

04 Jul 2022, 10:40

You’re welcome. Stata will not resume what it was doing if you stop Stata or shutdown your computer.
Comment
Luis Pecht

Join Date: May 2017

Posts: 151
#5

04 Jul 2022, 16:07

in line with Leonardo's advice of working with chunks of data, check "CHUNKY: Stata module to chunk a large text file into smaller parts" https://ideas.repec.org/c/boc/bocode/s456994.html.
Comment
Federico Nutarelli

Join Date: Sep 2018

Posts: 430
#6

05 Jul 2022, 00:20

Thanks to all.
Comment

Announcement

Efficiently reduce memory usage in STATA

Comment

Comment

Comment

Comment

Comment