Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Optimizing spped of stata's collapse command

    Dear Statalists,

    the following is a question I have come across again and again when using the collapse command: Besides the fast option, is there any way to further speed up the command?

    For example, currently I am trying to collapse a data set with 50 million observations, taking simple sums of 25 indicator variables. Would (count) by faster than (sum)? Would it take less time if I was to partition the data into sets of observations (that I later append) or sets of variables (that I later merge)? If so, what would be the ideal number observations/variables/bytes per data set?

    Thank you very much for your input.

    Best wishes,
    Milan

  • #2
    Often asked here. The bottom lines include

    1. collapse is often slower than people want.

    2. You need to be a good Stata programmer to do better.

    3. For a good example of #2 see e.g. http://www.statalist.org/forums/foru...large-datasets

    Comment


    • #3
      Collapse seems to outdo egen total, for what its worth. Patience might be your best strategy, and making sure that the collapse step is something that is done only once

      Code:
      clear 
      set obs 5000000
      forvalues v=1/25{
      gen var`v' = runiform()
      }
      preserve
      
      timer clear
      timer on 1
      collapse (sum) var1-var25
      timer off 1
      
      restore
      
      timer on 2
      foreach var of varlist var1-var25{
      egen `var'sum = total(`var')
      drop `var'
      }
      keep in 1
      timer off 2
      
      timer list

      Code:
      . timer list
         1:     19.94 /        1 =      19.9430
         2:     67.58 /        1 =      67.5850

      Comment


      • #4
        Thank you for your tips. I tried both now and fcollapse (together with partitioning the data in 10 sets) was much faster than collapse. Thank you again, the hint helped a lot.

        Comment

        Working...
        X