Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nested Loops Longer Process Time Each Iteration

    I am trying to make my code more efficient. I have many nested loops in my program, included below and attached. The processing time for each iteration takes longer with each subsequent run. I am wondering what I can do to speed up my program and get consistent times for each iteration of all loops. Any ideas? Is there something parts of my code that can be removed or replaced for better results? I researched a bunch online but could not find anything that fits this description. I do not see anything accumulating as the loop progresses other than the putexcel command. Do you think as the code progresses and the excel file gets larger that could be slowing down the code? Any help is appreciated. Thank you!


    *code



    ***
    *Set up Data
    ***
    log using "main4.log", replace
    timer on 1
    do "DataSetUp_V1.do"
    timer off 1
    di "time for setup file"
    timer list 1

    ***************
    ***************
    ***Set topline variable list
    local varlist Q3 Q4 Q8 Q9 Q9A Q10 Q11 Q12 Q13 Q14 Q15 Q15C ///
    Q15B Q16 Q17 Q18 Q19 Q19A Q20 Q20A Q22 Q23dem Q23rep Q24dem Q24rep Q25dem Q25rep Q26dem Q26rep ///
    Q27 Q28 Q29 Q30 Q31b Q32b Q34 VH1 VH2 Q35 Q36 Q37 Q38 Q39 Q40 Q41 Q42 ///
    Q45 Q46 Q47 Q48 Q49 Q50 Q69 Q70 Q71 Q72 Q73 Q74 Q75 ///
    RID1 RID2 Q76 Q77 Q78 WD1 WD2 WD3 WD4 KS1 KS2 KS3 KS4 DW1 DW2 DW3 DW4 ///
    Q79 Q80 Q81 Q82recode Q83 Q84 Q85 Q86 Q87 RID1A Q90 Q91recode Q92 Q93


    ************************************************** **************************************
    **NOT CURRENTLY INCLUDED, BECAUSE STATE-LEVEL ANALYSIS PRODUCES NO OBSERVATIONS
    **FOR STATES WITHOUT THAT INITIATIVE.
    *CA ballot initiative
    *Q21A
    *FL ballot initiative
    *Q21I
    ************************************************** **************************************


    ***Set crosstab variable list
    local crosstabvarlist Gender Race HispLat Education Income Age3 pid






    ***set list of weight and sample variables
    local weights joint_wgt_national joint_wgt_state

    ***********************************
    ***Add additional note that is common to every question
    timer on 2
    foreach var of varlist `varlist' `ssrs_questions' `gradient_questions' {
    notes `var': 2022 Collaborative Midterm Survey (Enns, Barry, & Schuldt).
    }
    timer off 2
    di "time loop line 49"
    timer list 2

    ************************************************** ***************
    ************************************************** ***************
    *Retain only the variables used in the topline/cross tab analysis
    ************************************************** ***************
    ************************************************** ***************

    keep `varlist' `crosstabvarlist' `weights' CA_FL_WI
    ***************************************





    *generate today's date as local variable
    local date : di %tdNDCY daily("$S_DATE", "DMY")

    *generate excel file
    *quietly putexcel set "CalculationsForWebsite\CMS_proportions_finald ata_ `date'", sheet(proportion) replace
    quietly putexcel set "CMS_proportions_MAIN_finaldata_V`vnum'", sheet(proportion) replace
    *Set row 1 variable names
    quietly putexcel A1="variable" B1="question" C1="tabulation_type" D1="demographic_type" E1="demographic_value" F1="response" G1="proportion" ///
    H1="demographic_name" I1 = "topic" J1 = "figure_note" K1 = "figure_note2" L1 = "figure_note3" M1 = "group_n"

    *set local values used below
    local d=2
    local m=2
    local num=2

    timer on 3
    *loop through main variables and through cross tab variables within main variables
    *topline varlist set above
    foreach var of varlist `varlist' {
    *Save question wording (variable label)
    timer on 4
    local lab: variable label `var'

    *************************************
    *MAIN TOPLINES
    *************************************
    ********************
    *set weight variable
    quietly svyset [pweight=joint_wgt_national]
    ********************
    svy: tab `var' if `var'<=998, perc stubw(150) format(%2.0f) nomarginal
    quietly matrix topline = 100*e(Prop)
    *Unweighted N
    quietly tab `var' if `var'<=998
    local N = `r(N)'
    *Save local N with comma for thousands place
    *local nc : display %5.0fc `N'
    quietly putexcel G`m'=matrix(topline), nformat(number)
    local m = `m'+`e(r)'
    local range = `d'+`e(r)'-1
    timer off 4
    di "time btw loop 89-109. `var'"
    timer list 4
    timer on 5
    forvalues r = `d'/`range' {
    quietly putexcel A`r'="`var'" B`r'="``var'[note1]'" C`r'="topline" I`r'="``var'[note2]'" J`r'="N=`:display%5.0fc `N''." K`r'="``var'[note3]'" L`r'="``var'[note4]'" M`r'="NA"
    local d = `range'+1
    local row= `range'+1
    }
    timer off 5
    di "time loop 113-17. `var'"
    timer list 5
    *Extract and export main variable response categories
    timer on 6
    forvalues response = 1/`e(r)' {
    quietly levelsof `var' if `var'<=998, local(lev)
    local valuelbl: value label `var'
    timer on 7
    foreach c of local lev {

    local v: label `valuelbl' `c'
    quietly putexcel F`num++'="`v'"
    }
    timer off 7
    di "time loop 126-32. `var'"
    timer list 7
    local num= `range'+1
    }
    timer off 6
    di "time loop 123-36. `var'"
    timer list 6
    ***********************************************
    *MAIN CROSSTABS: gender, race/ethnicity, education, income, age, partisanship
    ***********************************************
    timer on 8
    foreach cross of varlist `crosstabvarlist' {
    *calculate weighted cross tabs, omit total percentage
    quietly svy: tab `cross' `var' if `var'<998 & `cross'<98, row perc stubw(150) format(%2.0f) nomarginal
    *Save proportions as a matrix
    quietly matrix prop = e(N_pop)*e(Prop)
    quietly mata: st_matrix("prop", (st_matrix("prop") :/ rowsum(st_matrix("prop"))))
    *Calculate Group N's (unweighted)
    quietly tab `cross' `var' if `var'<998 & `cross'<98, matcell(freq)
    quietly mata: st_matrix("groupn", rowsum(st_matrix("freq")))
    *
    *Add variable name, question wording, demographic type, demographic name (variable name to display)
    local range = `d'+`e(r)'*`e(c)'-1
    timer on 9
    forvalues r = `d'/`range' {
    quietly putexcel A`r'="`var'" B`r'="``var'[note1]'" C`r'="crosstab" D`r'="`e(rowvar)'" H`r'="`e(rowvlab)'" K`r'="``var'[note3]'" L`r'="``var'[note4]'" M`r'="NA"
    local d = `range'+1
    }
    timer off 9
    di "time loop 156-60. `var'`cross'"
    timer list 9
    *Extract and export percentages
    local one=1
    timer on 10
    forvalues cat = 1/`e(r)' {
    quietly mat c`cat' = 100*prop[`one++', 1 .. `e(c)']'
    quietly mat l c`cat'
    quietly putexcel G`m'=matrix(c`cat'), nformat(number)
    quietly putexcel M`m'=matrix(groupn[`cat',1]), nformat(number_sep)
    local m = `m'+`e(c)'
    }
    timer off 10
    di "time loop 167-173. `var'`cross'"
    timer list 10
    *Extract and export demographic categories (only include categories in above crosstab)
    quietly levelsof `cross' if e(sample), local(clev)
    local valuelbl: value label `cross'
    timer on 11
    foreach cc of local clev {
    local cv: label `valuelbl' `cc'
    timer on 12
    forvalues cat = 1/`e(c)' {
    quietly putexcel E`row++'="`cv'"
    }
    timer off 12
    di "time loop 183-86. `var'`cross'"
    timer list 12
    }
    timer off 11
    di "time loop 180-190. `var'`cross'"
    timer list 11
    *Extract and export main variable response categories
    *Added & `cross'<98 because values of var with no observations were messing up spreadsheet
    timer on 13
    forvalues response = 1/`e(r)' {
    quietly levelsof `var' if `var'<998 & `cross'<98, local(lev)
    local valuelbl: value label `var'
    timer on 14
    foreach c of local lev {
    local v: label `valuelbl' `c'
    quietly putexcel F`num++'="`v'"
    }
    timer off 14
    di "time loop 200-04. `var'`cross'"
    timer list 14
    }
    timer off 13
    di "time loop 196-209. `var'`cross'"
    timer list 13
    }
    timer off 8
    di "time loop 143-215. `var'"
    timer list 8
    ************************************************
    **MAIN STATE ANAYSIS: CA, FL, WI: NEED TO BE WEIGHTED BY STATE WEIGHTS
    ************************************************
    *not showing: J`r'="``var'[note3]'" K`r'="``var'[note4]'"
    *set weight variable
    *SINCE NO WEIGHT VARIABLE YET, GENERATE ONE
    timer on 15
    quietly svyset [pweight=joint_wgt_state]
    ********************
    quietly svy: tab `var' if `var'<998 & CA_FL_WI==5, perc stubw(150) format(%2.0f) nomarginal
    quietly matrix topline = 100*e(Prop)
    *Calculate Group N's (unweighted)
    quietly tab `var' if `var'<998 & CA_FL_WI==5, matcell(freq)
    *Sum column to get sample size for state
    quietly mata: st_matrix("groupn", colsum(st_matrix("freq")))
    quietly putexcel G`m'=matrix(topline), nformat(number)
    local m = `m'+`e(r)'
    local range = `d'+`e(r)'-1
    quietly forvalues r = `d'/`range' {
    quietly putexcel A`r'="`var'" B`r'="``var'[note1]'" C`r'="crosstab" D`r'="CA_FL_WI" E`r'="California" I`r'="``var'[note2]'" H`r'="State Oversample: CA, FL, & WI" K`r'="``var'[note3]'" L`r'="``var'[note4]'" M`r'=matrix(groupn), nformat(number_sep)
    local d = `range'+1
    local row= `range'+1
    }
    timer off 15
    di "time loop 234-38. `var'"
    timer list 15
    *Extract and export main variable response categories
    timer on 16
    forvalues response = 1/`e(r)' {
    quietly levelsof `var' if `var'<998, local(lev)
    local valuelbl: value label `var'
    timer on 17
    foreach c of local lev {
    local v: label `valuelbl' `c'
    quietly putexcel F`num++'="`v'"
    }
    timer off 17
    di "time loop 248-251. `var'"
    timer list 17
    local num= `range'+1
    }
    timer off 16
    di "time loop 244-256. `var'"
    timer list 16
    ********************
    ********************
    *set weight variable
    timer on 18
    quietly svyset [pweight=joint_wgt_state]
    ********************
    quietly svy: tab `var' if `var'<998 & CA_FL_WI==10, perc stubw(150) format(%2.0f) nomarginal
    quietly matrix topline = 100*e(Prop)
    *Calculate Group N's (unweighted)
    quietly tab `var' if `var'<998 & CA_FL_WI==10, matcell(freq)
    *Sum column to get sample size for state
    quietly mata: st_matrix("groupn", colsum(st_matrix("freq")))
    quietly putexcel G`m'=matrix(topline), nformat(number)
    local m = `m'+`e(r)'
    local range = `d'+`e(r)'-1
    forvalues r = `d'/`range' {
    quietly putexcel A`r'="`var'" B`r'="``var'[note1]'" C`r'="crosstab" D`r'="CA_FL_WI" E`r'="Florida" I`r'="``var'[note2]'" H`r'="State Oversample: CA, FL, & WI" K`r'="``var'[note3]'" L`r'="``var'[note4]'" M`r'=matrix(groupn), nformat(number_sep)
    local d = `range'+1
    local row= `range'+1
    }
    timer off 18
    di "time loop 263-279. `var'"
    timer list 18
    *Extract and export main variable response categories
    timer on 19
    forvalues response = 1/`e(r)' {
    quietly levelsof `var' if `var'<998, local(lev)
    local valuelbl: value label `var'
    timer on 20
    foreach c of local lev {
    local v: label `valuelbl' `c'
    quietly putexcel F`num++'="`v'"
    }
    timer off 20
    di "time loop 289-292. `var'"
    timer list 20
    local num= `range'+1
    }
    timer off 19
    di "time loop 285-300. `var'"
    timer list 19
    ********************
    timer on 21
    *set weight variable
    quietly svyset [pweight=joint_wgt_state]
    ********************
    quietly svy: tab `var' if `var'<998 & CA_FL_WI==50, perc stubw(150) format(%2.0f) nomarginal
    quietly matrix topline = 100*e(Prop)
    *Calculate Group N's (unweighted)
    quietly tab `var' if `var'<998 & CA_FL_WI==50, matcell(freq)
    *Sum column to get sample size for state
    quietly mata: st_matrix("groupn", colsum(st_matrix("freq")))
    quietly putexcel G`m'=matrix(topline), nformat(number)
    local m = `m'+`e(r)'
    local range = `d'+`e(r)'-1
    forvalues r = `d'/`range' {
    quietly putexcel A`r'="`var'" B`r'="``var'[note1]'" C`r'="crosstab" D`r'="CA_FL_WI" E`r'="Wisconsin" I`r'="``var'[note2]'" H`r'="State Oversample: CA, FL, & WI" K`r'="``var'[note3]'" L`r'="``var'[note4]'" M`r'=matrix(groupn), nformat(number_sep)
    local d = `range'+1
    local row= `range'+1
    }
    timer off 21
    di "time loop 315-320. `var'"
    timer list 21
    *Extract and export main variable response categories
    timer on 22
    forvalues response = 1/`e(r)' {
    quietly levelsof `var' if `var'<998, local(lev)
    local valuelbl: value label `var'
    timer on 23
    foreach c of local lev {
    local v: label `valuelbl' `c'
    quietly putexcel F`num++'="`v'"
    }
    timer off 23
    di "time loop 329-332. `var'"
    timer list 23
    local num= `range'+1
    }
    timer off 22
    di "time loop 325-340. `var'"
    timer list 22
    ********************
    }

    timer off 3
    timer list 3
    di "time big loop 123-36"
    timer list 3
    log close












    ************************************************** *********************
    ************************************************** ************************
    ************************************************** ************************
    ************************************************** ************************
    Attached Files
    Last edited by Jacob Grippin; 16 Feb 2023, 12:39.

  • #2
    1) I see something over 300 ll. of code above. People here are very generous with their time, but I don't think many people will dig that deep. You'd have a much better chance of getting help if you can point people into the part of your code that is a problem. A better question to ask, at this point, would be "How do I find out where the slow parts are in my code?"

    2) You're using -timer- a lot, presumably to the end of 1), but I think you have more timers here than are helpful. I might start out with 2 or 3 timers, and then with that information, narrow down to some parts of the code that don't matter for time. Then, I'd put my 2 or 3 timers into the part of the code that *does* matter, run the code again, and narrow down further. For example, before timing within loops, time whole loops and find out which ones matter. Then, you could time chunks of code within loops. The 80/20 (or even 95/5) rule applies here, namely that a small part of code is virtually certain to account for almost all the run time, and getting a sense where that is would help you and help others help you.

    3) I don't understand this: "The processing time for each iteration takes longer with each subsequent run."
    What is your referent for "iteration" here? You have many things that might be called "iteration," or at least "loops."
    What do you mean by "subsequent" run? By "subsequent" maybe you mean later iterations within some for-loop? Or, maybe you mean that you run the code more than once within a Stata session, it gets slower?

    3) A completely wild stab: I once found, in an earlier Stata version, that having -set matsize- (maximum matrix size) set to a large value could slow things down. You might try setting this to something small. I normally keep it set at 400 unless I have have to make it bigger. I tried to look at your code with that in mind, but there was too much to digest.

    Comment


    • #3
      Hello Mike,
      Thank you for responding. Providing my answers to your questions below.
      1) I have updated my query as suggested. The nested loops below is the section I would like to speed up.
      2) I used timer a lot to determine which loops took the longest. The one included below took the longest
      3) What I mean by this is the first time the loop below runs it takes about 10 seconds to complete. The loop runs several times over all variables inside 'crosstabvarlist' with each variable getting longer and longer. The final time this loop runs for the final variable it takes several minutes to complete rather than 10 seconds from the start. I saw online that something could be accumulating with each additional variable, but I could not find anything by looking at the code.

      foreach cross of varlist `crosstabvarlist' {
      *calculate weighted cross tabs, omit total percentage
      quietly svy: tab `cross' `var' if `var'<998 & `cross'<98, row perc stubw(150) format(%2.0f) nomarginal
      *Save proportions as a matrix
      quietly matrix prop = e(N_pop)*e(Prop)
      quietly mata: st_matrix("prop", (st_matrix("prop") :/ rowsum(st_matrix("prop"))))
      *Calculate Group N's (unweighted)
      quietly tab `cross' `var' if `var'<998 & `cross'<98, matcell(freq)
      quietly mata: st_matrix("groupn", rowsum(st_matrix("freq")))
      *
      *Add variable name, question wording, demographic type, demographic name (variable name to display)
      local range = `d'+`e(r)'*`e(c)'-1
      forvalues r = `d'/`range' {
      quietly putexcel A`r'="`var'" B`r'="``var'[note1]'" C`r'="crosstab" D`r'="`e(rowvar)'" H`r'="`e(rowvlab)'" K`r'="``var'[note3]'" L`r'="``var'[note4]'" M`r'="NA"
      local d = `range'+1
      }

      *Extract and export percentages
      local one=1
      forvalues cat = 1/`e(r)' {
      quietly mat c`cat' = 100*prop[`one++', 1 .. `e(c)']'
      quietly mat l c`cat'
      quietly putexcel G`m'=matrix(c`cat'), nformat(number)
      quietly putexcel M`m'=matrix(groupn[`cat',1]), nformat(number_sep)
      local m = `m'+`e(c)'
      }
      *Extract and export demographic categories (only include categories in above crosstab)
      quietly levelsof `cross' if e(sample), local(clev)
      local valuelbl: value label `cross'
      foreach cc of local clev {
      local cv: label `valuelbl' `cc'
      forvalues cat = 1/`e(c)' {
      quietly putexcel E`row++'="`cv'"
      }
      }
      *Extract and export main variable response categories
      *Added & `cross'<98 because values of var with no observations were messing up spreadsheet
      forvalues response = 1/`e(r)' {
      quietly levelsof `var' if `var'<998 & `cross'<98, local(lev)
      local valuelbl: value label `var'
      foreach c of local lev {
      local v: label `valuelbl' `c'
      quietly putexcel F`num++'="`v'"
      }
      }
      }


      Thank you,
      Jacob

      Comment


      • #4
        I do see one think in your code that looks inefficient to me. -if- clauses are always expensive because every observation in the data set must be inspected to determine if it satisfies the condition. Sometimes there is no way to avoid this. But I notice that your -if- clause is often -if `var'<998 & `cross'<98-. As this is applying to large numbers of variables, and seeing the numbers 998 and 98 seem to apply to all of them, I have a hunch that 998 and 98 are "magic numbers" that your data set uses as a signal for missing values. If I have that right, before you even enter these loops, get rid of those. You can use -mvdecode- (or just a suitable -replace- command) to change these magic numbers to actual Stata missing values. Stata has a hard coded way to screen out Stata missing values from nearly all commands--if you use Stata's missing values, not magic numbers. So you can remove those -if- clauses and things will speed up.

        That said, I cannot think of any reason why this problem should grow more burdensome as you progress through the loop. So there may be other things. But this is an easy one to fix. I don't know how much of a speedup you will get from just this, but I'd give it a try. -if- clauses inside of loops can cause surprising bottlenecks.

        Comment


        • #5
          Hello Clyde. I appreciate the help that you have provided. And all the posts you have responded to that I have seen over the years. Unfortunately replacing the 'if' options with 'replace' commands did not result in the program running faster. I also tried running in parallel with the 'parallel' package, but that gave errors. The excel spreadsheet would be open and locked on one of the cores. then another core would try to open and write but an error would print out as cannot write to the excel file which is already open and locked.

          If anybody has any other suggestions I would be happy to try them out. Otherwise thank you for thosewho viewed and responded.
          Jacob

          Comment

          Working...
          X