Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding the speedup from using parallel with synthetic cohort methods.

    I am trying to optimize the speed of the user-written -synth_runner- command using 8-core StataMP 15.1 on a Mac with 8 cores and 16 GB of physical memory.

    I ran a simulation where I varied the number of clusters from 1 to 8 and also performed a non-parallelized version of the analysis (all code at bottom).

    Here are the results, where the timer # corresponds to the number of clusters. Timer 10 is the non-clustered version.
    Code:
    . timer list
       1:     43.59 /        1 =      43.5920
       2:     25.23 /        1 =      25.2340
       3:     20.99 /        1 =      20.9890
       4:     20.11 /        1 =      20.1050
       5:     19.37 /        1 =      19.3670
       6:     19.36 /        1 =      19.3550
       7:     20.06 /        1 =      20.0600
       8:     19.27 /        1 =      19.2720
      10:     77.37 /        1 =      77.3670
    I am struggling to understand why the time does not increase very much after 3 cores. I also get similar results when I used the nested optimization option (though obviously all the times are longer).

    Here's the code:

    Code:
    set more off
    capture trace off
    clear all
    cls
    
    cap drop pre_rmspe post_rmspe lead effect cigsale_synth
    cap drop cigsale_scaled effect_scaled cigsale_scaled_synth D
    cap program drop my_pred my_drop_units my_xperiod my_mspeperiod
    
    program my_pred, rclass
        args tyear
        return local predictors "beer(`=`tyear'-4'(1)`=`tyear'-1') lnincome(`=`tyear'-4'(1)`=`tyear'-1')"
    end
    
    program my_drop_units
        args tunit
        if `tunit'==39 qui drop if inlist(state,21,38)
        if `tunit'==3 qui drop if state==21
    end
    
    program my_xperiod, rclass
        args tyear
        return local xperiod "`=`tyear'-12'(1)`=`tyear'-1'"
    end
    
    program my_mspeperiod, rclass
        args tyear
        return local mspeperiod "`=`tyear'-12'(1)`=`tyear'-1'"
    end
    
    
    timer clear
    timer on 10
    
    use smoking, clear
    tsset state year
    
    gen byte D = (state==3 & year>=1989) | (state==7 & year>=1988)
    
    synth_runner cigsale retprice age15to24, d(D) pred_prog(my_pred) trends training_propr(`=13/18') ///
    drop_units_prog(my_drop_units) xperiod_prog(my_xperiod) mspeperiod_prog(my_mspeperiod) deterministicoutput ///nested
    
    effect_graphs
    pval_graphs
    
    timer off 10
    
    forvalues p = 1(1)8 {
    
        timer on `p'
    
        parallel clean, all
        parallel setclusters `p'
    
        use smoking, clear
        tsset state year
    
        gen byte D = (state==3 & year>=1989) | (state==7 & year>=1988)
    
        synth_runner cigsale retprice age15to24, d(D) pred_prog(my_pred) trends training_propr(`=13/18') ///
        drop_units_prog(my_drop_units) xperiod_prog(my_xperiod) mspeperiod_prog(my_mspeperiod) parallel deterministicoutput ///nested
    
        effect_graphs
        pval_graphs
    
        timer off `p'
    }
    
    
    timer list

  • #2
    Nested optimization timers here:

    Code:
    . timer list
       1:   2140.79 /        1 =    2140.7890
       2:   1354.59 /        1 =    1354.5870
       3:   1128.21 /        1 =    1128.2060
       4:    925.34 /        1 =     925.3360
       5:    716.35 /        1 =     716.3450
       6:    746.78 /        1 =     746.7810
       7:    694.51 /        1 =     694.5140
       8:    714.49 /        1 =     714.4910
      10:   2190.32 /        1 =    2190.3220
    As above, timer # corresponds to the number of clusters. Timer 10 is the non-parallel version.

    Comment

    Working...
    X