Thanks to Kit Baum, a new command called runby (with Clyde Schechter) is now available on SSC. To install it, type in Stata's Command window:
runby loops over data by-groups. A by-group is a subset of the initial data in memory and includes all observations with the same value for the variables specified in the by(varlist) option.
You can run as many Stata commands as you want on each by-group. All you need to do is to wrap these commands in a generic Stata program.
With each loop iteration, runby replaces the data in memory with the by-group's data and runs your program. What's left in memory when your program terminates is considered results and is stored. When runby finishes, the data in memory contains the combined results from all by-groups. runby does not care about what's left in memory, it will grab it all and save it all.
runby is a more efficient alternative to commands like statsby and loop based solutions (via levelsof and foreach ...). Because the commands run on data subsets, there is no need to use if or in qualifiers to target by-group observations.
runby will be useful if you need to run estimations by groups (see the panel-specific regressions example in the help file). It will also be useful with some matching problems when the number of possible pairwise combinations is too large to handle in one pass. There's a great example of case-control pairing in the help file.
For large problems, there's a status option that will trigger progress reports to print in the Results window. These show the elapsed time, how many by-groups have been processed so far (with how many that end with program errors or no data), how many results observations have been saved, and finally an estimated time to completion. The frequency of reports is 1 per second initially and gradually slows down to every 5 minutes after 1 hour of running time.
For those who like to think outside the box, runby can be useful for some data management tasks. You can easily partition a large dataset into separate datasets, one for each by-group. You can even use runby to automate the import of a bunch of files into Stata. You use runby to loop over a list of files and let your program handle all the steps needed to import each file. There are examples for each of these uses in the help file. Here's an example from today that shows how to import problematic Excel files using runby.
Here's a quick example that shows the basic functionality:
and the results:
By default, runby uses Mata to do its thing because it is very fast at moving data around. The downside is that it requires extra memory to store a copy of the initial data and to store results. There's an option to use Stata only commands (use, save, and append) if you are tight on memory, with a definite impact on execution times.
Code:
ssc install runby
You can run as many Stata commands as you want on each by-group. All you need to do is to wrap these commands in a generic Stata program.
With each loop iteration, runby replaces the data in memory with the by-group's data and runs your program. What's left in memory when your program terminates is considered results and is stored. When runby finishes, the data in memory contains the combined results from all by-groups. runby does not care about what's left in memory, it will grab it all and save it all.
runby is a more efficient alternative to commands like statsby and loop based solutions (via levelsof and foreach ...). Because the commands run on data subsets, there is no need to use if or in qualifiers to target by-group observations.
runby will be useful if you need to run estimations by groups (see the panel-specific regressions example in the help file). It will also be useful with some matching problems when the number of possible pairwise combinations is too large to handle in one pass. There's a great example of case-control pairing in the help file.
For large problems, there's a status option that will trigger progress reports to print in the Results window. These show the elapsed time, how many by-groups have been processed so far (with how many that end with program errors or no data), how many results observations have been saved, and finally an estimated time to completion. The frequency of reports is 1 per second initially and gradually slows down to every 5 minutes after 1 hour of running time.
For those who like to think outside the box, runby can be useful for some data management tasks. You can easily partition a large dataset into separate datasets, one for each by-group. You can even use runby to automate the import of a bunch of files into Stata. You use runby to loop over a list of files and let your program handle all the steps needed to import each file. There are examples for each of these uses in the help file. Here's an example from today that shows how to import problematic Excel files using runby.
Here's a quick example that shows the basic functionality:
Code:
clear all program try_this summarize rep78, meanonly replace rep78 = r(mean) gen mrep78_N = r(N) keep foreign rep78 mrep78_N keep in 1 end sysuse auto runby try_this, by(foreign) list
Code:
. list +-------------------------------+ | rep78 foreign mrep78_N | |-------------------------------| 1. | 3.02083 Domestic 48 | 2. | 4.28571 Foreign 21 | +-------------------------------+
Comment