I am using Stata 14.0 with 25 million observations where I'd like to collapse on 300 "x" variables (mostly sum, but some means) using about 12 different by "v" variable combinations, but a single collapse command is taking several hours. I realize I could do this loop for one of by "v" variable combinations, but the bysorts take a long time as well. I'd appreciate any suggestions.
foreach x in varlist x1-x300{
bysort v1 v2: gen temp=sum(`x')
bysort v1 v2: egen `x'_sum=max(temp)
drop temp
}
egen tag_1_sum = tag(v1 v2)
keep if tag_1_sum==1
sav "data1.dta", replace
Thanks,
Brent Fulton
UC Berkeley
foreach x in varlist x1-x300{
bysort v1 v2: gen temp=sum(`x')
bysort v1 v2: egen `x'_sum=max(temp)
drop temp
}
egen tag_1_sum = tag(v1 v2)
keep if tag_1_sum==1
sav "data1.dta", replace
Thanks,
Brent Fulton
UC Berkeley
Comment