I am trying to optimize the speed of the user-written -synth_runner- command using 8-core StataMP 15.1 on a Mac with 8 cores and 16 GB of physical memory.
I ran a simulation where I varied the number of clusters from 1 to 8 and also performed a non-parallelized version of the analysis (all code at bottom).
Here are the results, where the timer # corresponds to the number of clusters. Timer 10 is the non-clustered version.
I am struggling to understand why the time does not increase very much after 3 cores. I also get similar results when I used the nested optimization option (though obviously all the times are longer).
Here's the code:
I ran a simulation where I varied the number of clusters from 1 to 8 and also performed a non-parallelized version of the analysis (all code at bottom).
Here are the results, where the timer # corresponds to the number of clusters. Timer 10 is the non-clustered version.
Code:
. timer list 1: 43.59 / 1 = 43.5920 2: 25.23 / 1 = 25.2340 3: 20.99 / 1 = 20.9890 4: 20.11 / 1 = 20.1050 5: 19.37 / 1 = 19.3670 6: 19.36 / 1 = 19.3550 7: 20.06 / 1 = 20.0600 8: 19.27 / 1 = 19.2720 10: 77.37 / 1 = 77.3670
Here's the code:
Code:
set more off capture trace off clear all cls cap drop pre_rmspe post_rmspe lead effect cigsale_synth cap drop cigsale_scaled effect_scaled cigsale_scaled_synth D cap program drop my_pred my_drop_units my_xperiod my_mspeperiod program my_pred, rclass args tyear return local predictors "beer(`=`tyear'-4'(1)`=`tyear'-1') lnincome(`=`tyear'-4'(1)`=`tyear'-1')" end program my_drop_units args tunit if `tunit'==39 qui drop if inlist(state,21,38) if `tunit'==3 qui drop if state==21 end program my_xperiod, rclass args tyear return local xperiod "`=`tyear'-12'(1)`=`tyear'-1'" end program my_mspeperiod, rclass args tyear return local mspeperiod "`=`tyear'-12'(1)`=`tyear'-1'" end timer clear timer on 10 use smoking, clear tsset state year gen byte D = (state==3 & year>=1989) | (state==7 & year>=1988) synth_runner cigsale retprice age15to24, d(D) pred_prog(my_pred) trends training_propr(`=13/18') /// drop_units_prog(my_drop_units) xperiod_prog(my_xperiod) mspeperiod_prog(my_mspeperiod) deterministicoutput ///nested effect_graphs pval_graphs timer off 10 forvalues p = 1(1)8 { timer on `p' parallel clean, all parallel setclusters `p' use smoking, clear tsset state year gen byte D = (state==3 & year>=1989) | (state==7 & year>=1988) synth_runner cigsale retprice age15to24, d(D) pred_prog(my_pred) trends training_propr(`=13/18') /// drop_units_prog(my_drop_units) xperiod_prog(my_xperiod) mspeperiod_prog(my_mspeperiod) parallel deterministicoutput ///nested effect_graphs pval_graphs timer off `p' } timer list
Comment