Hello everyone,
I am working on a big dataset. I need to perform a task many times, and the script I have written for it is pretty slow. I was wondering if you have suggestions on how to make it faster. The toy example of my task is as follows:
My real task has many more observations and variables. I thought that perhaps it's faster to use bysort: somehow, but I cannot see how to do it.
All suggestions are welcomed.
Thanks!
I am working on a big dataset. I need to perform a task many times, and the script I have written for it is pretty slow. I was wondering if you have suggestions on how to make it faster. The toy example of my task is as follows:
Code:
clear set obs 50 generate time = _n expand 2, generate(kind) expand 100, generate(copied) generate v1 = runiform(0,1) generate v2 = runiform(0,1) generate v3 = runiform(0,1) // Read time range summarize time local tmin = `r(min)' local tmax = `r(max)' // Iterate over all variables ds time kind copied, not foreach v of varlist `r(varlist)' { // Create empties to store estimates generate pe_`v' = . generate lb_`v' = . generate ub_`v' = . // Iterate over time and spouse/versions of EmSt forvalues k = 0/1 { forvalues t = `tmin'/`tmax' { // Point estimate * Check is non-missing count if `v' != . & copied == 0 & time == `t' & kind == `k' if (`r(N)' > 0) { // Not all are missing summarize `v' if copied == 0 & time == `t' & kind == `k' replace pe_`v' = `r(mean)' if time == `t' & kind == `k' } // Confidence intervals * Check is non-missing count if `v' != . & copied > 0 & time == `t' & kind == `k' if (`r(N)' > 0) { // Not all are missing _pctile `v' if copied > 0 & time == `t' & kind == `k', p(2.5 97.5) replace lb_`v' = `r(r1)' if time == `t' & kind == `k' replace ub_`v' = `r(r2)' if time == `t' & kind == `k' } } } } // Keep relevant variables & observations keep time kind pe_* lb_* ub_* bysort time kind: keep if _n == 1
All suggestions are welcomed.
Thanks!
Comment