Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulate or count factor variable values

    I have a large number of binary interaction variables in regressions that I create with i., #, and ## operators. I need to produce the cell counts for these variables. If I generated the interactions manually, this would be easy, as I'd just run a foreach loop of the interactions and tabulate them or count if `var'==0 and count if `var'==1. But since I've used the factor variable notation, the interactions aren't saved as variables, and as Nick Cox wrote to answer a different question in 2023, "tabulate knows nothing about interactions."
    Is there a workaround than creating all the dummies and the interactions manually?

    Code:
    sysuse auto, clear
    set seed 1492
    gen cat=floor(runiform(1,5))
    
    reg price weight i.cat##i.foreign
    summ i.cat##i.foreign //this runs, but isn't quite what I want
    
    tab1 i.cat##i.foreign //this is what I'd like
    count if i.cat##i.foreign ==0 //something like this would also be an option

  • #2
    Yes, it's unfortunate that the -tab- command does not support factor variable notation in this way. You might find that the output of -summ i.cat#i.foreign- is satisfactory for your purposes. It won't give you the count in each combination, but it will give you the proportion of all observations that are, and you can do it as a one-liner.

    If that isn't satisfactory, you can always write a loop:
    Code:
    levelsof cat, local(cats)
    levelsof foreign, local(foreigns)
    foreach c of local cats {
        foreach f of local foreigns {
            count if `c'.cat#`f'.foreign
        }
    }
    Another simpler approach, but one that, again, doesn't produce the results in exactly the form you were seeking, would be just -tab cat foreign-.

    Comment


    • #3
      Another approach might be to use the command xi to generate the dummies and interactions for you:

      Code:
      . xi i.cat*i.foreign, noomit
      i.cat*i.foreign   _IcatXfor_#_#       (coded as above)
      
      . tab1 _I*
      
      -> tabulation of _Icat_1  
      
           cat==1 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         53       71.62       71.62
                1 |         21       28.38      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _Icat_2  
      
           cat==2 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         59       79.73       79.73
                1 |         15       20.27      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _Icat_3  
      
           cat==3 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         55       74.32       74.32
                1 |         19       25.68      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _Icat_4  
      
           cat==4 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         55       74.32       74.32
                1 |         19       25.68      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _Iforeign_0  
      
       foreign==0 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         22       29.73       29.73
                1 |         52       70.27      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _Iforeign_1  
      
       foreign==1 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         52       70.27       70.27
                1 |         22       29.73      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_1_0  
      
         cat==1 & |
       foreign==0 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         59       79.73       79.73
                1 |         15       20.27      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_1_1  
      
         cat==1 & |
       foreign==1 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         68       91.89       91.89
                1 |          6        8.11      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_2_0  
      
         cat==2 & |
       foreign==0 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         63       85.14       85.14
                1 |         11       14.86      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_2_1  
      
         cat==2 & |
       foreign==1 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         70       94.59       94.59
                1 |          4        5.41      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_3_0  
      
         cat==3 & |
       foreign==0 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         62       83.78       83.78
                1 |         12       16.22      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_3_1  
      
         cat==3 & |
       foreign==1 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         67       90.54       90.54
                1 |          7        9.46      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_4_0  
      
         cat==4 & |
       foreign==0 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         60       81.08       81.08
                1 |         14       18.92      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      
      -> tabulation of _IcatXfor_4_1  
      
         cat==4 & |
       foreign==1 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         69       93.24       93.24
                1 |          5        6.76      100.00
      ------------+-----------------------------------
            Total |         74      100.00
      Last edited by Hemanshu Kumar; 16 May 2025, 21:01.

      Comment


      • #4
        I'd add a mention of groups from the Stata Journal, which could be used to list several cross-combinations.

        Code:
        . search st0496, entry
        
        Search of official help files, FAQs, Examples, and Stata Journals
        
        SJ-18-1 st0496_1  . . . . . . . . . . . . . . . . . Software update for groups
                (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                Q1/18   SJ 18(1):291
                groups exited with an error message if weights were specified;
                this has been corrected
        
        SJ-17-3 st0496  . . . . .  Speaking Stata: Tables as lists: The groups command
                (help groups if installed)  . . . . . . . . . . . . . . . .  N. J. Cox
                Q3/17   SJ 17(3):760--773
                presents command for listing group frequencies and percents and
                cumulations thereof; for various subsetting and ordering by
                frequencies, percents, and so on; for reordering of columns;
                and for saving tabulated data to new datasets

        Comment


        • #5
          Not the most preferable approach, but you could set up a simple wrapper for tab1 that accepts factor variables:
          Code:
          capture program drop tab1fv
          program tab1fv
              
              version 18
              
              syntax varlist(fv)
              
              fvexpand `varlist'
              local fvvarnames `r(varlist)'
              
              fvrevar `varlist'
              local fvvarlist `r(varlist)'
              
              local i 0
              foreach fv of local fvvarlist {
                  
                  local varlabel : word `++i' of `fvvarnames'
                  label variable `fv' "`varlabel'" 
                  
              }
              
              tab1 `fvvarlist'
              
          end
          Applied to the example:
          Code:
          . tab1fv i.cat##i.foreign
          
          -> tabulation of __000000  
          
               1b.cat |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         74      100.00      100.00
          ------------+-----------------------------------
                Total |         74      100.00
          
          -> tabulation of __000001  
          
                2.cat |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         59       79.73       79.73
                    1 |         15       20.27      100.00
          ------------+-----------------------------------
                Total |         74      100.00
          
          -> tabulation of __000002  
          
                3.cat |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         55       74.32       74.32
                    1 |         19       25.68      100.00
          ------------+-----------------------------------
                Total |         74      100.00
          
          (output omitted)
          
          -> tabulation of __00000D  
          
          4.cat#1.for |
                 eign |      Freq.     Percent        Cum.
          ------------+-----------------------------------
                    0 |         69       93.24       93.24
                    1 |          5        6.76      100.00
          ------------+-----------------------------------
                Total |         74      100.00

          Comment


          • #6
            If you have Stata 17 or newer, you can use collect with total to
            build custom tables of the cell frequencies from your factor variables and
            their interactions.

            Here are these commands in action using the example from the original post.
            Code:
            . sysuse auto, clear
            (1978 automobile data)
            
            . set seed 1492
            
            . gen cat=floor(runiform(1,5))
            
            . 
            . reg price weight i.cat##i.foreign
            
                  Source |       SS           df       MS      Number of obs   =        74
            -------------+----------------------------------   F(8, 65)        =      9.09
                   Model |   335322351         8  41915293.8   Prob > F        =    0.0000
                Residual |   299743045        65  4611431.47   R-squared       =    0.5280
            -------------+----------------------------------   Adj R-squared   =    0.4699
                   Total |   635065396        73  8699525.97   Root MSE        =    2147.4
            
            ------------------------------------------------------------------------------
                   price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
            -------------+----------------------------------------------------------------
                  weight |   3.259298   .4077505     7.99   0.000     2.444964    4.073632
                         |
                     cat |
                      2  |  -1124.669   857.4147    -1.31   0.194    -2837.044    587.7063
                      3  |  -1036.552      831.7    -1.25   0.217    -2697.572    624.4675
                      4  |  -263.8832   798.1131    -0.33   0.742    -1857.825    1330.059
                         |
                 foreign |
                Foreign  |   2951.217   1162.493     2.54   0.014     629.5584    5272.875
                         |
             cat#foreign |
              2#Foreign  |   2020.676   1646.526     1.23   0.224    -1267.663    5309.015
              3#Foreign  |   1301.853   1460.385     0.89   0.376    -1614.737    4218.442
              4#Foreign  |  -489.4531   1531.645    -0.32   0.750     -3548.36    2569.454
                         |
                   _cons |  -4190.884   1483.783    -2.82   0.006    -7154.204   -1227.564
            ------------------------------------------------------------------------------
            
            . 
            . * store these results, in case you need them later since -total- is an
            . * eclass command 
            . estimates store fit
            
            . 
            . * -total- has support for factor variables notation since Stata 16
            . collect : total weight i.cat##i.foreign
            
            Total estimation                            Number of obs = 74
            
            --------------------------------------------------------------
                         |      Total   Std. err.     [95% conf. interval]
            -------------+------------------------------------------------
                  weight |     223440   6685.672      210115.5    236764.5
                         |
                     cat |
                      1  |         21   3.904686      13.21797    28.78203
                      2  |         15   3.481851      8.060681    21.93932
                      3  |         19   3.783526      11.45944    26.54056
                      4  |         19   3.783526      11.45944    26.54056
                         |
                 foreign |
               Domestic  |         52   3.958691      44.11034    59.88966
                Foreign  |         22   3.958691      14.11034    29.88966
                         |
             cat#foreign |
             1#Domestic  |         15   3.481851      8.060681    21.93932
              1#Foreign  |          6   2.364115      1.288325    10.71167
             2#Domestic  |         11   3.081096      4.859385    17.14061
              2#Foreign  |          4   1.958473       .096769    7.903231
             3#Domestic  |         12   3.192457      5.637443    18.36256
              3#Foreign  |          7   2.534691      1.948368    12.05163
             4#Domestic  |         14   3.392175      7.239406    20.76059
              4#Foreign  |          5   2.173943      .6673382    9.332662
            --------------------------------------------------------------
            
            . 
            . * style and label as needed
            . collect style header cat foreign, title(label)
            
            . collect style column, dups(center)
            
            . collect style row stack, nobinder
            
            . collect label levels result _r_b "Total", modify
            
            . 
            . * arrange the tables you want
            . 
            . * basic estimation layout
            . collect layout (colname) (result[_r_b])
            
            Collection: default
                  Rows: colname
               Columns: result[_r_b]
               Table 1: 18 x 1
            
            -------------------------
                             |  Total
            -----------------+-------
            Weight (lbs.)    | 223440
            cat              |       
              1              |     21
              2              |     15
              3              |     19
              4              |     19
            Car origin       |       
              Domestic       |     52
              Foreign        |     22
            cat # Car origin |       
              1 # Domestic   |     15
              1 # Foreign    |      6
              2 # Domestic   |     11
              2 # Foreign    |      4
              3 # Domestic   |     12
              3 # Foreign    |      7
              4 # Domestic   |     14
              4 # Foreign    |      5
            -------------------------
            
            . 
            . * factor variable cell totals
            . collect layout (cat foreign) (result[_r_b])
            
            Collection: default
                  Rows: cat foreign
               Columns: result[_r_b]
               Table 1: 8 x 1
            
            ------------------
                       | Total
            -----------+------
            cat        |      
              1        |    21
              2        |    15
              3        |    19
              4        |    19
            Car origin |      
              Domestic |    52
              Foreign  |    22
            ------------------
            
            . 
            . * cross tabulation of the two factor variables
            . collect layout (cat) (foreign) (result[_r_b])
            
            Collection: default
                  Rows: cat
               Columns: foreign
                Tables: result[_r_b]
               Table 1: 5 x 2
            
            ----------------------
                |    Car origin   
                | Domestic Foreign
            ----+-----------------
            cat |                 
              1 |       15       6
              2 |       11       4
              3 |       12       7
              4 |       14       5
            ----------------------

            Comment


            • #7
              Thanks, all! You've given me several good options. I think #3 is the simplest that does exactly what I want in two lines of code. (I guess three lines if I drop the variables after the tabulations.) #6 could only be one line (total i.cat##i.foreign) since in the case of binary variables the total equals the count of 1's in the data, but for my purpose (a disclosure review) it might be nice to see the number of 0's explicitly, without having to do even the simplest bit of subtraction.

              Comment

              Working...
              X