Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Modifying memory settings for large databases

    Hello everyone,

    I have a problem when using a database with 15 billion of obervations. I have Stata 17/MP and enough RAM to work with this database but I have multiple memory disks in my computer and the one tha holds the operative system is 256 GB(maybe too small for the ammount of data) and I would like Stata to use other disk in the PC that has 10TB capacity. After reading the memory documentation, I am not clear about making this change.

    Many thanks

  • #2
    See this FAQ for how to move Stata's temporary directory.

    Comment


    • #3
      And because that FAQ doesn't give directly Mac-specific advice, I'll point to this earlier topic containing my advice for those running Stata for Mac.

      https://www.statalist.org/forums/for...nment-variable

      Comment


      • #4
        Thank you very much Leonardo. I did what recommended in the FAQ and running the code for creating the database. It has been running since 5 hours, so I am not sure yet if it worked but the change on the temporary storage path was succesful

        Comment


        • #5
          Note that the Stata temporary directory is only the place where Stata writes temporary datasets to disk, and is quite separate from the available RAM memory available. The OS manages all memory requests, and for several versions now, Stata does not need explicit memory management. However, as your working dataset is huge, I can imaging that it would also create temporary datasets that may exhaust the available space on your boot drive (with 256 GB). There is not guarantee or reason to expect that this change will speed anything up, only that it should allow Stata to use as much disk space of the 10 TB that are free for temporary datasets, and prevent Stata from throwing an error should it run out of disk space.

          Comment


          • #6
            Thank you Leonardo for your explanation. As you mentioned, I have 215 datasets with 72 million observations each one and I am trying to append all the 215 datasets using the folloing code:


            display "//-----------------Time: $S_TIME ---------//"
            use "D:\Javier\Cost_sharing\BASES\granregresion-AMB-0.dta", clear

            foreach lag of numlist 1(1)107 {
            append using "D:\Javier\Cost_sharing\BASES\granregresion-AMB-L`lag'.dta", generate(L`lag'_dummy) // la nueva variable marca con uno las observaciones de la base con la que se hace el append.
            label drop _append
            }

            display "//-----------------Time: $S_TIME ---------//"



            foreach lag of numlist 1(1)107 {
            append using "D:\Javier\Cost_sharing\BASES\granregresion-PI-L`lag'.dta", generate(PI`lag'_dummy) // la nueva variable marca con uno las observaciones de la base con la que se hace el append.
            label drop _append
            }
            compress

            display "//-----------------Time: $S_TIME ---------//"



            Unfortunately, I haven“t been able to complete the task of appending all datasets and even further to be able to run the regression I need to run. I have done tests and when appending up to 70 datasets, it works. But when appending more than that number Satata stays loading for 6 or 7 hours and then the computer crashes and I have to reboot it.r

            I hope that this information could allow you to suggest some aditional advice.

            Comment


            • #7
              You are adding 214 new dummy variables to your dataset, which is expanding it greatly, I expect. Perhaps the following approach would be successful.
              Code:
              display "//-----------------Time: $S_TIME ---------//"
              use "D:\Javier\Cost_sharing\BASES\granregresion-AMB-0.dta", clear 
              
              generate int L_dummy = 0
              foreach lag of numlist 1(1)107 { 
              append using "D:\Javier\Cost_sharing\BASES\granregresion-AMB-L`lag'.dta", generate(temp)
              replace L_dummy = `lag' if temp==1
              drop temp
              label drop _append 
              }
              
              display "//-----------------Time: $S_TIME ---------//"
              
              generate int PI_dummy = 0
              foreach lag of numlist 1(1)107 { 
              append using "D:\Javier\Cost_sharing\BASES\granregresion-PI-L`lag'.dta", generate(temp)
              replace PI_dummy = `lag' if temp==1
              drop temp
              label drop _append
              } 
              compress
              
              display "//-----------------Time: $S_TIME ---------//"

              Comment

              Working...
              X