No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quickly getting means by group

    I have what I assume is a painfully simple question: is there a faster way to produce a table of averages by a grouping variable? For instance, if I had a continuous variable (var1) and binary variable (var2), and I wanted to get the mean of var1 for the whole dataset, for var2 == 0, and for var2 == 1, I could just do:

    sum var1
    sum var1 if var2==0
    sum var1 if var2==1
    Is there a way to get those all in one line of code to be produced in a single table? Something like the equivalent of -tab var1 var2, col- but collapsed into an average for var1 somehow.

  • #2
    Perhaps tabstat can do that:

    tabstat var1, by(var2)
    There is an extra option stat() that allows us to choose what to show:

    tabstat var1, stat(mean sd n) by(var2)


    • #3
      Thanks Ken--I looked at tabstat earlier, and the two things I was trying to get it to do (that I couldn't) were to: (1) provide the mean of var1 for the whole sample, as well as for var2's categories (that's the more important issue) and then (2) to ideally get the var2 categories to be columns in the output, instead of rows (a less important thing), just to make it easier to quickly copy over into the table shell that I need to use.


      • #4
        I do hope that what follows can help:
        . sysuse auto.dta
        (1978 automobile data)
        . tabstat price, stat(N mean sd p50 min max) by(foreign)
        Summary for variables: price
        Group variable: foreign (Car origin)
         foreign |         N      Mean        SD       p50       Min       Max
        Domestic |        52  6072.423  3097.104    4782.5      3291     15906
         Foreign |        22  6384.682  2621.915      5759      3748     12990
           Total |        74  6165.257  2949.496    5006.5      3291     15906
        Kind regards,
        (StataNow 18.5)


        • #5
          Thanks very much, Carlo.