Optimizing spped of stata's collapse command

Milan Quentel

Join Date: Nov 2016

Posts: 52
#1

Optimizing spped of stata's collapse command

25 Apr 2017, 08:09

Dear Statalists,

the following is a question I have come across again and again when using the collapse command: Besides the fast option, is there any way to further speed up the command?

For example, currently I am trying to collapse a data set with 50 million observations, taking simple sums of 25 indicator variables. Would (count) by faster than (sum)? Would it take less time if I was to partition the data into sets of observations (that I later append) or sets of variables (that I later merge)? If so, what would be the ideal number observations/variables/bytes per data set?

Thank you very much for your input.

Best wishes,
Milan
Tags: collapse, speed
Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

25 Apr 2017, 08:14

Often asked here. The bottom lines include

1. collapse is often slower than people want.

2. You need to be a good Stata programmer to do better.

3. For a good example of #2 see e.g. http://www.statalist.org/forums/foru...large-datasets
Comment

Jorrit Gosens

Join Date: Jan 2015
Posts: 1019

25 Apr 2017, 08:44

Collapse seems to outdo egen total, for what its worth. Patience might be your best strategy, and making sure that the collapse step is something that is done only once

Code:

clear 
set obs 5000000
forvalues v=1/25{
gen var`v' = runiform()
}
preserve

timer clear
timer on 1
collapse (sum) var1-var25
timer off 1

restore

timer on 2
foreach var of varlist var1-var25{
egen `var'sum = total(`var')
drop `var'
}
keep in 1
timer off 2

timer list

Code:

. timer list
   1:     19.94 /        1 =      19.9430
   2:     67.58 /        1 =      67.5850

Comment

Milan Quentel

Join Date: Nov 2016

Posts: 52
#4

26 Apr 2017, 00:41

Thank you for your tips. I tried both now and fcollapse (together with partitioning the data in 10 sets) was much faster than collapse. Thank you again, the hint helped a lot.
Comment

Announcement

Optimizing spped of stata's collapse command

Comment

Comment

Comment