Thanks to Kit Baum I am happy to announce that multishell is now available on SSC. As usual it can be installed by typing
in the command bar.
What is the purpose of the program? multishell allows the efficient processing of loops and multiple do files across a single and multiple computers. It dissects forvalues and foreach loops and creates for each variation of the loop (tasks) a separate do file and batch file. Stata's build in winexec command is used to start a new instance of Stata using the .bat file. The instance is closed as soon as the task is completed (or failed, then it is reported) and a new instance processing the next task is started. One instance is reserved to organise the tasks and starts other instances. Multiple instances can be run in parallel on the same computer or across computers, mimicking a cluster. The computer which acts as a server will distribute the tasks to the different machines, given the maximum number of instances possible on each machine.
How to run it? For example, it is common to use Monte Carlo simulations to assess the bias of an estimator. This is done by varying the number of observations, let's say from n=10 in steps of 10 to n = 130. Assume the DGP and the regression are part of the program MonteCarloSim. The number of observations is set as the only argument of the program and the estimated coefficient of variable x is returned as r(x). The program is saved in a do file, called example_MC.do together with the code containing a forvalues loop with different values of n:
multishell creates for each of the variations of n (n=10, n=40,...,n=130) a do file and a .bat file. The files are then queued and consecutively processed by multiple instances of Stata on a single computer or by multiple computers.
To start multishell a second do file is required. The do file contains the commands for the multishell environment, such as setting a temporary path, pointing to the Stata exe and an additional ado path and adds do files to the queue and runs multishell:
An output window will appear and show the name of the do file, the number of tasks and a breakdown of all variations. At most 6 instances of Stata will be started in parallel (set by the option threads).
It is possible to include another computer which has access to the folder set by multishell path and run multishell on both. On the second computer there is no need to add the do file(s) again, as they are created and managed by the first computer, the server. The command lines for the client are:
In total there are 12 instances of Stata running in parallel on two machines, speeding up processing loops.
More examples are available in the help file and example do files are available as well.
At the moment only Microsoft Windows is supported.
Code:
ssc install multishell
What is the purpose of the program? multishell allows the efficient processing of loops and multiple do files across a single and multiple computers. It dissects forvalues and foreach loops and creates for each variation of the loop (tasks) a separate do file and batch file. Stata's build in winexec command is used to start a new instance of Stata using the .bat file. The instance is closed as soon as the task is completed (or failed, then it is reported) and a new instance processing the next task is started. One instance is reserved to organise the tasks and starts other instances. Multiple instances can be run in parallel on the same computer or across computers, mimicking a cluster. The computer which acts as a server will distribute the tasks to the different machines, given the maximum number of instances possible on each machine.
How to run it? For example, it is common to use Monte Carlo simulations to assess the bias of an estimator. This is done by varying the number of observations, let's say from n=10 in steps of 10 to n = 130. Assume the DGP and the regression are part of the program MonteCarloSim. The number of observations is set as the only argument of the program and the estimated coefficient of variable x is returned as r(x). The program is saved in a do file, called example_MC.do together with the code containing a forvalues loop with different values of n:
Code:
forvalues n = 10 (10) 130 { simulate bx = r(x), reps(1000) : MonteCarloSim `n' }
To start multishell a second do file is required. The do file contains the commands for the multishell environment, such as setting a temporary path, pointing to the Stata exe and an additional ado path and adds do files to the queue and runs multishell:
Code:
clear adopath ++ "C:\documents\multishell\ado\" multishell path "C:\documents\multishell\test\output\", clear multishell exepath "C:\Program Files (x86)\Stata14\StataSE-64.exe" multishell adopath "C:\documents\multishell\ado\" multishell add "C:\documents\multishell\simulation\example_MC.do" multishell run, threads(6) sleep(2000)
It is possible to include another computer which has access to the folder set by multishell path and run multishell on both. On the second computer there is no need to add the do file(s) again, as they are created and managed by the first computer, the server. The command lines for the client are:
Code:
clear adopath ++ "C:\documents\multishell\ado\" multishell path "C:\documents\multishell\test\output\", clear multishell exepath "C:\Program Files (x86)\Stata14\StataSE-64.exe" multishell adopath "C:\documents\multishell\ado\" multishell run client, threads(6) sleep(2000)
More examples are available in the help file and example do files are available as well.
At the moment only Microsoft Windows is supported.
Comment