Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parallel foreach?

    To improve speed I would like to parallelize a loop of this kind:

    Code:
    global myvars v1 v2 v3
    foreach var of varlist $myvars {
    gen `var'_2 = `var'^2  // here is actually something very computationally extensive
    }
    My actual varlist is long (~30variables) and the computation in the loop is computationally very expensive. It would be amazing if I could compute multiple variables at once (of course each step is indepdendent of each other...).


  • #2
    You could try reading the variables into Mata, working on them together, and reading them back. My wild guess is that you won't go faster that way. Separate variables are separate variables. One detail among several is that so long as all your variables are numeric, the loop should work regardless of storage type of the individual variables.

    I've not hitherto regarded squaring as a big deal numerically, even for 30 variables. Naturally you've not told us how many observations you have.

    Comment


    • #3
      Thanks Nick Cox, the variables won't be squared, as I tried to indicate with the comment next to the squared commands. The actual computation is more complex (and slow since if requires if statements). The code snipet should just illustrate the case. I was basically hoping for a parforeach operator. In R or Python I could have used parallel libraries, bummer that this is not (yet) available in Stata.

      Comment


      • #4
        Speed is an issue for many users -- and StataCorp too, I guess. Having started computing in 1973 when output 2 hours later was good and 2 days later was common I still have the mindset that you can read a book or go for coffee if the machine is slow.

        But seriously, like anybody else, I would be very happy to see some big speed-ups in future releases.

        Comment


        • #5
          this might just do the trick in case anyone else runs into a similar problem, but for me it is easier to migrate this to R.

          Comment


          • #6
            In R or Python I could have used parallel libraries, bummer that this is not (yet) available in Stata.
            You told us what version of Stata you are using. Perhaps what you're looking for is Stata/MP. StataCorp claims "Adding new variables is nearly 100 percent parallelized ..." More details in https://www.stata.com/statamp/statamp-20171003.pdf.

            Perhaps the user-written parallel package available from SSC will allow you to parallelize your code effectively. More details in the output of ssc describe parallel .

            Finally, you haven't shown us your actual code. Perhaps you have overlooked some possibilities for improving the performance of your code.

            Or perhaps you just prefer to use R. Can't argue with that.

            Comment


            • #7
              Another way to attack it is to run multiple invocations of Stata at the same time each doing part of the loop and then merge the files.

              As William noted, if you provide your code and what you want to calculate there is a high probability that Nick or William or someone can tell you a much more efficient way to do the calculations in Stata. For example, frequently folks think they need to do loops when one can do away with loops altogether.

              Comment

              Working...
              X