Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • memory usage

    I am running Stata/MP v. 14.2, on a Windows10 PC with a I7-600 Quad CPU @3.41 GHz, with 32GB of installed memory and a 64-bit operating system. The file I am trying to process ("Z:\30firmsample.dta") is 52MB. But, I am computing a lot of covariances within a loop using this panel (state, year, firm, month) data set. My program has been running about 2 days now and the has completed 6 out of the 25 loop iterations. I have been monitoring the memory usage, and what is interesting/disturbing is that memory usage is rising over time. Stata's usage began around 5GB, and now it is over 12GB. At this rate, the program will probably not finish since it will run out of memory - despite a "clear" command each time the loop begins. I'm thinking there is very likely a much more efficient method to accomplish my goal (sequentially updating the variable "sum_r"). Thanks for any suggestions. My code follows:

    forvalues i=1988/2013{
    clear
    use "Z:\30firmsample.dta", clear
    keep if year==`i'
    egen s1=max(state)
    local n=s1
    forvalues s=1/`n'{
    use "Z:\data\30firmsample.dta", clear
    keep if year==`i'
    keep if state==`s'
    keep mreturn month year state firm_order
    reshape wide mreturn, i(month) j(firm_order)
    corrci mreturn*, saving(corr`i'`s'.dta,replace)
    use corr`i'`s',clear
    egen sum_r=sum(r)
    replace sum_r=2*sum_r
    keep sum_r
    duplicates drop sum_r,force
    gen state=`s'
    gen year=`i'
    use "Z:\30firmsample.dta", clear

    qui save "Z:\corr_within`i'`s'.dta",replace
    append using "Z:\corr_within.dta"
    save "Z:\corr_within.dta",replace
    }
    }

  • #2
    Without looking inside your code to see what is really going on, here's one quick thought of something easy to try: -reshape- can be quite slow and resource-intensive, and it appears within your innermost loop. (As -reshape- has to accommodate a diverse range of data conditions, it's not likely to be optimal for any given case.) There are some user-written replacements for -reshape- that claim to be faster and less demanding on the machine. -findit sreshape- shows one that looks promising. I haven't tried that program, but my experience from fashioning my own reshape substitute for a particular situation is that this can be very effective.
    Last edited by Mike Lacy; 17 Nov 2016, 08:03.

    Comment


    • #3
      Another quick idea...could you do a single reshape of the full dataset and then calculate the correlations you need using conditionals? My experience is similar to MIke's, that reshape is probably what's killing you here. If you can reduce it to a single instance, even though that one instance will still take some time, then you'll likely save over the long run of repeated re shapes.

      Comment


      • #4
        I'm not sure if it'll do what you require, but you might also be interested in the -pwcorrf- command on ssc. It has a "reshape" option that will calculate correlations within panel units. It is several orders of magnitude faster than reshaping and then calculating the correlation.

        It's fairly untested though, so let me know if any bugs arise (I wrote the command).

        Comment


        • #5
          Thanks a lot. Interesting. I thought it was the corrci command that was slowing things down. It still doesn't explain why the memory usage continues to creep up. Stata is currently at 13.25 GBs and it still hasn't finished iteration #6. Wish I could get a plot of RAM usage over the duration of the dofile.

          Comment


          • #6
            It may be that corrci is taking a long time (I've not used it). At this point, I would probably kill the program and try to diagnose where the issue is. You could try a single reshape on single state and year (what it seems your current code is trying to do) and see how long that takes. When it's done, you could try the corrci command you want to run and see how long that takes. In other words, just go through a single run of what the loop goes through and see how long each part takes. I'd put money on the reshape though.

            Comment


            • #7
              You can also use profiler on/profiler off, it'll tell you how long each component took.

              Comment


              • #8
                You can probably determine what is using up memory by adding a -memory- command after clear. Something isn't clearing. Also, I suggest you cut down the amount of data till the program runs in a few seconds while tracking this down. You can do all the data once the problem is fixed. I agree that reshape is probably slowing you down.

                Comment

                Working...
                X