Understanding the speedup from using parallel with synthetic cohort methods.

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

Understanding the speedup from using parallel with synthetic cohort methods.

18 Jan 2018, 13:05

I am trying to optimize the speed of the user-written -synth_runner- command using 8-core StataMP 15.1 on a Mac with 8 cores and 16 GB of physical memory.

I ran a simulation where I varied the number of clusters from 1 to 8 and also performed a non-parallelized version of the analysis (all code at bottom).

Here are the results, where the timer # corresponds to the number of clusters. Timer 10 is the non-clustered version.

Code:

. timer list
   1:     43.59 /        1 =      43.5920
   2:     25.23 /        1 =      25.2340
   3:     20.99 /        1 =      20.9890
   4:     20.11 /        1 =      20.1050
   5:     19.37 /        1 =      19.3670
   6:     19.36 /        1 =      19.3550
   7:     20.06 /        1 =      20.0600
   8:     19.27 /        1 =      19.2720
  10:     77.37 /        1 =      77.3670

I am struggling to understand why the time does not increase very much after 3 cores. I also get similar results when I used the nested optimization option (though obviously all the times are longer).

Here's the code:

Code:

set more off
capture trace off
clear all
cls

cap drop pre_rmspe post_rmspe lead effect cigsale_synth
cap drop cigsale_scaled effect_scaled cigsale_scaled_synth D
cap program drop my_pred my_drop_units my_xperiod my_mspeperiod

program my_pred, rclass
    args tyear
    return local predictors "beer(`=`tyear'-4'(1)`=`tyear'-1') lnincome(`=`tyear'-4'(1)`=`tyear'-1')"
end

program my_drop_units
    args tunit
    if `tunit'==39 qui drop if inlist(state,21,38)
    if `tunit'==3 qui drop if state==21
end

program my_xperiod, rclass
    args tyear
    return local xperiod "`=`tyear'-12'(1)`=`tyear'-1'"
end

program my_mspeperiod, rclass
    args tyear
    return local mspeperiod "`=`tyear'-12'(1)`=`tyear'-1'"
end


timer clear
timer on 10

use smoking, clear
tsset state year

gen byte D = (state==3 & year>=1989) | (state==7 & year>=1988)

synth_runner cigsale retprice age15to24, d(D) pred_prog(my_pred) trends training_propr(`=13/18') ///
drop_units_prog(my_drop_units) xperiod_prog(my_xperiod) mspeperiod_prog(my_mspeperiod) deterministicoutput ///nested

effect_graphs
pval_graphs

timer off 10

forvalues p = 1(1)8 {

    timer on `p'

    parallel clean, all
    parallel setclusters `p'

    use smoking, clear
    tsset state year

    gen byte D = (state==3 & year>=1989) | (state==7 & year>=1988)

    synth_runner cigsale retprice age15to24, d(D) pred_prog(my_pred) trends training_propr(`=13/18') ///
    drop_units_prog(my_drop_units) xperiod_prog(my_xperiod) mspeperiod_prog(my_mspeperiod) parallel deterministicoutput ///nested

    effect_graphs
    pval_graphs

    timer off `p'
}


timer list

Tags: parallel, synth, synth_runner

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

18 Jan 2018, 19:06

Nested optimization timers here:

Code:

. timer list
   1:   2140.79 /        1 =    2140.7890
   2:   1354.59 /        1 =    1354.5870
   3:   1128.21 /        1 =    1128.2060
   4:    925.34 /        1 =     925.3360
   5:    716.35 /        1 =     716.3450
   6:    746.78 /        1 =     746.7810
   7:    694.51 /        1 =     694.5140
   8:    714.49 /        1 =     714.4910
  10:   2190.32 /        1 =    2190.3220

As above, timer # corresponds to the number of clusters. Timer 10 is the non-parallel version.

Announcement

Understanding the speedup from using parallel with synthetic cohort methods.

Comment