Modifying memory settings for large databases

Javier Amaya-Nieto

Join Date: Jun 2022

Posts: 6
#1

Modifying memory settings for large databases

26 Jul 2022, 21:55

Hello everyone,

I have a problem when using a database with 15 billion of obervations. I have Stata 17/MP and enough RAM to work with this database but I have multiple memory disks in my computer and the one tha holds the operative system is 256 GB(maybe too small for the ammount of data) and I would like Stata to use other disk in the PC that has 10TB capacity. After reading the memory documentation, I am not clear about making this change.

Many thanks
Tags: None
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2457
#2

26 Jul 2022, 22:33

See this FAQ for how to move Stata's temporary directory.
2 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

27 Jul 2022, 10:33

And because that FAQ doesn't give directly Mac-specific advice, I'll point to this earlier topic containing my advice for those running Stata for Mac.

https://www.statalist.org/forums/for...nment-variable
Comment
Javier Amaya-Nieto

Join Date: Jun 2022

Posts: 6
#4

27 Jul 2022, 14:38

Thank you very much Leonardo. I did what recommended in the FAQ and running the code for creating the database. It has been running since 5 hours, so I am not sure yet if it worked but the change on the temporary storage path was succesful
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2457
#5

27 Jul 2022, 15:25

Note that the Stata temporary directory is only the place where Stata writes temporary datasets to disk, and is quite separate from the available RAM memory available. The OS manages all memory requests, and for several versions now, Stata does not need explicit memory management. However, as your working dataset is huge, I can imaging that it would also create temporary datasets that may exhaust the available space on your boot drive (with 256 GB). There is not guarantee or reason to expect that this change will speed anything up, only that it should allow Stata to use as much disk space of the 10 TB that are free for temporary datasets, and prevent Stata from throwing an error should it run out of disk space.
1 like
Comment
Javier Amaya-Nieto

Join Date: Jun 2022

Posts: 6
#6

29 Jul 2022, 09:25

Thank you Leonardo for your explanation. As you mentioned, I have 215 datasets with 72 million observations each one and I am trying to append all the 215 datasets using the folloing code:

display "//-----------------Time: $S_TIME ---------//"
use "D:\Javier\Cost_sharing\BASES\granregresion-AMB-0.dta", clear

foreach lag of numlist 1(1)107 {
append using "D:\Javier\Cost_sharing\BASES\granregresion-AMB-L`lag'.dta", generate(L`lag'_dummy) // la nueva variable marca con uno las observaciones de la base con la que se hace el append.
label drop _append
}

display "//-----------------Time: $S_TIME ---------//"

foreach lag of numlist 1(1)107 {
append using "D:\Javier\Cost_sharing\BASES\granregresion-PI-L`lag'.dta", generate(PI`lag'_dummy) // la nueva variable marca con uno las observaciones de la base con la que se hace el append.
label drop _append
}
compress

display "//-----------------Time: $S_TIME ---------//"

Unfortunately, I haven´t been able to complete the task of appending all datasets and even further to be able to run the regression I need to run. I have done tests and when appending up to 70 datasets, it works. But when appending more than that number Satata stays loading for 6 or 7 hours and then the computer crashes and I have to reboot it.r

I hope that this information could allow you to suggest some aditional advice.
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

29 Jul 2022, 09:45

You are adding 214 new dummy variables to your dataset, which is expanding it greatly, I expect. Perhaps the following approach would be successful.

Code:

display "//-----------------Time: $S_TIME ---------//"
use "D:\Javier\Cost_sharing\BASES\granregresion-AMB-0.dta", clear 

generate int L_dummy = 0
foreach lag of numlist 1(1)107 { 
append using "D:\Javier\Cost_sharing\BASES\granregresion-AMB-L`lag'.dta", generate(temp)
replace L_dummy = `lag' if temp==1
drop temp
label drop _append 
}

display "//-----------------Time: $S_TIME ---------//"

generate int PI_dummy = 0
foreach lag of numlist 1(1)107 { 
append using "D:\Javier\Cost_sharing\BASES\granregresion-PI-L`lag'.dta", generate(temp)
replace PI_dummy = `lag' if temp==1
drop temp
label drop _append
} 
compress

display "//-----------------Time: $S_TIME ---------//"

Announcement

Modifying memory settings for large databases

Comment

Comment

Comment

Comment

Comment

Comment