Thanks to Kit Baum, my new command, dobatch, is now available on SSC. dobatch runs a do-file as a background batch process, allowing multiple do-files to execute in parallel. It requires Stata MP and macOS terminal or Linux.
dobatch checks system resources to ensure sufficient CPU availability and to limit the number of active Stata processes. There are two related use cases.
1. Running a large number of scripts in parallel without overloading your server
Suppose you are running a large number of Stata scripts that are independent of each other:
On a linux server, one could run each of these in parallel by launching them as separate jobs from the terminal:
This approach allows faster execution by leveraging multiple processors. However, the user must be cautious not to overload the server. Each background process consumes CPU and memory. You can use dobatch to manage this safely and efficiently. dobatch launches only a limited number of jobs at once and automatically starts new ones as earlier ones finish. All you need to do is replace do with dobatch:
By default, on a server with 64 processors running Stata MP 8, dobatch will wait until at least 7 CPUs are free and fewer than 8 Stata MP processes are running. If no other processes are running on the server, this allows up to 8 do-files to run in parallel in the background.
2. Parallelizing a for loop
Suppose you have the following script:
If each iteration of this loop runs independently, meaning it doesn’t rely on previous iterations, the loop can be parallelized. To do this, first modify the beginning of the script as follows:
Then, create a master script that uses dobatch to run the modified do-file multiple times, distributing the workload across parallel jobs. The example below splits the loop into four Stata jobs, each handling one-quarter of the iterations:
In this example, dobatch mydofile.do 1 25 passes the values 1 and 25 as arguments to mydofile.do, which stores them in the local macros lower and upper, respectively. To log the output of each job, include a log command in the do-file:
Additional information is available in the Stata help file and on Github. The command can be installed from SSC (ssc install dobatch, replace) or Github (net install dobatch, from("https://raw.githubusercontent.com/reifjulian/dobatch/master") replace).
dobatch checks system resources to ensure sufficient CPU availability and to limit the number of active Stata processes. There are two related use cases.
1. Running a large number of scripts in parallel without overloading your server
Suppose you are running a large number of Stata scripts that are independent of each other:
Code:
do script1.do do script2.do do script3.do …
Code:
nohup stata-mp -b do script1.do & nohup stata-mp -b do script2.do & nohup stata-mp -b do script3.do & …
Code:
dobatch script1.do dobatch script2.do dobatch script3.do …
2. Parallelizing a for loop
Suppose you have the following script:
Code:
* mydofile.do forval x = 1/100 { [...] }
Code:
* mydofile.do local lower `1' local upper `2' forval x = `lower'/`upper' { [...] }
Code:
* master.do dobatch mydofile.do 1 25 dobatch mydofile.do 26 50 dobatch mydofile.do 51 75 dobatch mydofile.do 76 100
Code:
* mydofile.do log query if mi("`r(name)'") log using "mydofile_`1'_`2'.log", text replace local lower `1' local upper `2' forval x = `lower'/`upper' { [...] }
Comment