Hello everyone,
I am working on a big dataset. I need to perform a task many times, and the script I have written for it is pretty slow. I was wondering if you have suggestions on how to make it faster. The toy example of my task is as follows:
My real task has many more observations and variables. I thought that perhaps it's faster to use bysort: somehow, but I cannot see how to do it.
All suggestions are welcomed.
Thanks!
I am working on a big dataset. I need to perform a task many times, and the script I have written for it is pretty slow. I was wondering if you have suggestions on how to make it faster. The toy example of my task is as follows:
Code:
clear
set obs 50
generate time = _n
expand 2, generate(kind)
expand 100, generate(copied)
generate v1 = runiform(0,1)
generate v2 = runiform(0,1)
generate v3 = runiform(0,1)
// Read time range
summarize time
local tmin = `r(min)'
local tmax = `r(max)'
// Iterate over all variables
ds time kind copied, not
foreach v of varlist `r(varlist)' {
// Create empties to store estimates
generate pe_`v' = .
generate lb_`v' = .
generate ub_`v' = .
// Iterate over time and spouse/versions of EmSt
forvalues k = 0/1 {
forvalues t = `tmin'/`tmax' {
// Point estimate
* Check is non-missing
count if `v' != . & copied == 0 & time == `t' & kind == `k'
if (`r(N)' > 0) { // Not all are missing
summarize `v' if copied == 0 & time == `t' & kind == `k'
replace pe_`v' = `r(mean)' if time == `t' & kind == `k'
}
// Confidence intervals
* Check is non-missing
count if `v' != . & copied > 0 & time == `t' & kind == `k'
if (`r(N)' > 0) { // Not all are missing
_pctile `v' if copied > 0 & time == `t' & kind == `k', p(2.5 97.5)
replace lb_`v' = `r(r1)' if time == `t' & kind == `k'
replace ub_`v' = `r(r2)' if time == `t' & kind == `k'
}
}
}
}
// Keep relevant variables & observations
keep time kind pe_* lb_* ub_*
bysort time kind: keep if _n == 1
All suggestions are welcomed.
Thanks!

Comment