Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Potential bug in table command

    I'm creating summary statistics using the table command, and I think there is a potential bug when using the * character to create a varlist of all variables.

    The problem can be seen with this example:
    Code:
    clear all
    sysuse auto, clear
    table (var) (result) [aw=mpg], stat(count *) stat(mean *) stat(sd *) stat(min *) stat(max *) // Includes 3 tempvars
    sum * [aw=mpg] // Does not include tempvars
    The table command adds 3 tempvars to the output even though no tempvars exist in memory. The first tempvar is equal to _n, the second tempvar is a column of 1s, and the third tempvar is equal to the weight.

    A look at table.ado points to collect table, and collect.ado shows that the variable equal to the weight comes from a tempvar `wvar'. I'm not for sure where the tempvar equal to _n or the tempvar equal to a column of 1s come from (perhaps the Mata function st_collect_table()).

    My best guess is that the table command parses the varlist after the creation of the tempvars, which causes the bug. Any suggestions?

    c(stata_version): 19
    c(version): 19
    table: *! version 6.0.0 01apr2021
    collect: *! version 1.2.2 09jan2026

  • #2
    Your use of * works for summarize because it parses/builds the varlist before creating any temporary variables.

    table parses its syntax in stages. The first stage it makes a call to syntax, parsing for if, in, weights, generating temporary variables from these key syntax elements, namely an indicator for the full estimation sample and another for the weights -- as you already point out. The variable containing the observation index is created by the sortpreserve property of the program/command that actually does work for table.

    While we will definitely look into improving the parsing code for option statistic() (to not include the programs own temporary variables), a simple work-around is to define a macro to contain your variables of interest. Something like
    Code:
    clear all
    sysuse auto
    unab vars : *
    table (var) (result) [aw=mpg], ///
        stat(count `vars') ///
        stat(mean `vars') ///
        stat(sd `vars') ///
        stat(min `vars') ///
        stat(max `vars')

    Comment


    • #3
      Thanks Jeff! I appreciate the explanation and for looking into the parsing code. That workaround will work great in the meantime.

      Comment

      Working...
      X