Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • xi vs tab, gen

    I know that xi is an outdated command, but since some important user-written commands don't allow factor variables, creating dummies remains important. Iv'e noticed that using xi, whether as a command or a prefix for estimation is very slow and it seems that using "qui: tab var, gen(new_)" is incredibly faster.

    i'm using NBER patent data for example:
    http://www.nber.org/~jbessen/pat76_06_assg.dta.zip

    the data has a time variable called appyear. generating dummies of appyear by xi and also by tab, gen show the remarkable difference ( stata 12 IC):
    Code:
    . summ appyear
    
        Variable |       Obs        Mean    Std. Dev.       Min        Max
    -------------+--------------------------------------------------------
         appyear |   3279509     1992.13    8.474844       1901       2006
    r; t=0.24 9:35:04
    
    . xi i.appyear
    i.appyear         _Iappyear_1901-2006 (naturally coded; _Iappyear_1901 omitted)
    r; t=104.10 9:36:48
    
    . qui tab appyear, gen(new_year)
    r; t=6.49 9:36:55
    xi takes more than 104 seconds, while tab, gen takes less than 7! why is xi so inefficient?

  • #2
    There are several differences, probably several more than I can invoke:

    1. xi is ado code, where tabulate is, or is based on, compiled C code. (Possibly Mata code too now, although I'd guess not.)

    2. xi is clearly more than a machine to generate indicators. It includes all sorts of checks and has the main role of running commands under its aegis. Even if you don't invoke any of that. explicitly xi probably does much more than is needed for the purpose you mention. It's true that tabulate does other things too, but evidently that is not crucial.

    Comment

    Working...
    X