Thanks to Kit Baum, an update to the package gtools is now available for download from SSC. From Stata 13.1 or later, use
See the original announcement here. In short, gtools implements a faster version of several Stata commands, incuding: collapse, reshape, xtile, tabstat, isid, egen, pctile, winsor, contract, levelsof, duplicates, and unique/distinct. For details on the package, see the official documentation. For details on the update, see the release notes. Some highlights:
New commands:
Results
Code:
ssc install gtools, replace
New commands:
- greshape long/wide, 4-20x faster than reshape long/wide (additionally accepts any number of i or j variables).
- greshape gather/spread, similar to long/wide but made to mimic the gather and spread commands in R's tidyr package.
- gstats tab, 5-40x faster than tabstat (additionally accepts any number of grouping variables).
- gstats sum, 5-10x faster than sum, detail (regular summarize is not slow, but -detail- is slow to compute all the percentiles).
- gstats winsor, 10-20x faster than winsor2.
- gcollapse, gegen, and gstats tab now allow the following statistics:
- select# and select-#, to select the #th smallest or largest value
- rawselect# and rawselect-#, ibid but ignoring weights.
- cv, to compute the coefficient of variation
- variance
- range
- gtop and glevelsof can save their results in a mata object via mata(name).
- gtop (gtoplevelsof) can list all the levels via ntop(.), similar to tablist (ntop(-.) lists from least to most common order; option -alpha- lists the top levels in variable order instead of frequency order.
- greshape allows varlist syntax for long to wide reshapes (though this cannot be combined with @ in the same sub); wide to long matches do not allow varlist syntax, but complex matches can be achieved via the option match(regex), which takes the stubs to be regular expressions (details here).
Code:
clear all ssc install winsor2 program bench gettoken timer call: 0, p(:) gettoken colon call: call, p(:) cap timer clear `timer' timer on `timer' `call' timer off `timer' qui timer list c_local r`timer' `=r(t`timer')' end set obs 10000000 gen groups = int(runiform() * 1000) gen smallg = mod(groups, 10) gen rsort = rnormal() gen rvar = rnormal() gen ix = _n sort rsort preserve rename (rsort rvar) (r1 r2) bench 11: greshape long r, i(ix) j(j) restore, preserve rename (rsort rvar) (r1 r2) greshape long r, i(ix) j(j) nochecks bench 16: greshape wide r, i(ix) j(j) restore, preserve rename (rsort rvar) (r1 r2) bench 10: reshape long r, i(ix) j(j) restore, preserve rename (rsort rvar) (r1 r2) greshape long r, i(ix) j(j) nochecks bench 15: reshape wide r, i(ix) j(j) restore bench 21: qui gstats winsor rvar, s(_wg) bench 20: qui winsor2 groups bench 26: qui gstats sum rvar bench 25: qui sum rvar, detail bench 31: qui gstats tab rvar, by(smallg) s(n mean min max) bench 30: qui tabstat rvar, by(smallg) s(n mean min max) local commands /// reshape_long /// reshape_wide /// winsor /// sum_detail /// tabstat local bench_table `" Versus | Native | gtools | % faster "' local bench_table `"`bench_table'"' _n(1) `" ------------ | ------ | ------ | -------- "' forvalues i = 10(5)30 { gettoken cmd commands: commands local pct "`:disp %7.2f 100 * (`r`i'' - `r`=`i'+1'') / `r`i'''" local dnative "`:disp %6.2f `r`i'''" local dgtools "`:disp %6.2f `r`=`i'+1'''" local cmd `"`:disp %12s "`cmd'"'"' local bench_table `"`bench_table'"' _n(1) `" `cmd' | `dnative' | `dgtools' | `pct'% "' } disp _n(1) `"`bench_table'"'
Code:
Versus | Native | gtools | % faster ------------ | ------ | ------ | -------- reshape_long | 111.63 | 8.21 | 92.65% reshape_wide | 127.61 | 16.52 | 87.05% winsor | 28.87 | 1.17 | 95.96% sum_detail | 30.50 | 1.63 | 94.65% tabstat | 32.63 | 1.03 | 96.83%
Comment