xi vs tab, gen

Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#1

xi vs tab, gen

06 Jan 2016, 13:00

I know that xi is an outdated command, but since some important user-written commands don't allow factor variables, creating dummies remains important. Iv'e noticed that using xi, whether as a command or a prefix for estimation is very slow and it seems that using "qui: tab var, gen(new_)" is incredibly faster.

i'm using NBER patent data for example:
http://www.nber.org/~jbessen/pat76_06_assg.dta.zip

the data has a time variable called appyear. generating dummies of appyear by xi and also by tab, gen show the remarkable difference ( stata 12 IC):

Code:

. summ appyear Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- appyear | 3279509 1992.13 8.474844 1901 2006 r; t=0.24 9:35:04 . xi i.appyear i.appyear _Iappyear_1901-2006 (naturally coded; _Iappyear_1901 omitted) r; t=104.10 9:36:48 . qui tab appyear, gen(new_year) r; t=6.49 9:36:55

xi takes more than 104 seconds, while tab, gen takes less than 7! why is xi so inefficient?
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35782
#2

06 Jan 2016, 13:10

There are several differences, probably several more than I can invoke:

1. xi is ado code, where tabulate is, or is based on, compiled C code. (Possibly Mata code too now, although I'd guess not.)

2. xi is clearly more than a machine to generate indicators. It includes all sorts of checks and has the main role of running commands under its aegis. Even if you don't invoke any of that. explicitly xi probably does much more than is needed for the purpose you mention. It's true that tabulate does other things too, but evidently that is not crucial.
Comment

Announcement

Comment