Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using all RAM/Swap on a merge - Stata MP on Linux

    Hi all. With all apologies, I may be doing insufficient homework to ask here while I'm digging in to my support options at my University direct from Stata, but I'm hoping something jumps out as obvious here.

    We have a high-profile user on a Redhat 7.4 64-bit VM, running Stata MP 15.1, with 8 CPU cores, 96GB physical RAM, and 16GB swap. He's been using this machine for similar work for almost a year without issue, but when he runs the code below on a 8.4GB dataset, the job visibly uses all physical RAM, then all available swap, then kills itself before completion.

    I'm not a Stata user, but I am his support on this. Is there anything that's obviously wrong or any troubleshooting/resolution you might recommend?

    His code:

    Code:
    use /srv/scratch/salience/tmp, clear
    gen date=datadate
    *********************************
    sort permco permno date
    merge 1:1 permco permno date using /srv/scratch/salience/crsp_daily_adjust, keepusing(permco permno date shrout cfacpr cfacshr shrcd exchc) sorted
    
    save /srv/scratch/salience/tmp2, replace

  • #2
    Well, if the master file /srv/scratch/salience/tmp contains variables that will not be needed in the analyses subsequent to the merge, those could be -drop-ped. But since this user was careful to specifing the -keepusing()- option in the -merge-, he or she seems to be aware of these issues, and I suspect that there will be little or nothing saved by pruning that master file.

    Another possibility is to have Stata -compress- both files. Even careful users who remember the days when memory was scarce nowadays tend to create dichotomous variables, that could be just 1 byte, as 4-byte floats. Multiply that by thousands of variables and millions of observations and you end up wasting a lot of memory. So this might help.

    I can't think of anything else offhand.

    Comment


    • #3
      I suggest

      Code:
      set max_memory 16g, permanently
      See "Serious bug in Linux OS" at https://www.stata.com/manuals/dmemory.pdf

      Comment


      • #4
        It might also help a bit if the sort was omitted just to see if that creates any additional resource consumption issues that could be negatively affecting things. I’d also suggest they maybe just rename their date variable instead of creating a copy of it since it could be using 8byte doubles instead of 4 byte floats which would also dramatically increase the memory overhead.

        Comment

        Working...
        X