Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Summary statistics for subgroups of a categorical variable

    Dear Statalist,

    I want to create a code giving me summary statistics (mean, min and max) of each subgroup of a single categorical variable.

    Example: I want to display mean age, min and max within each age group. The categorical variable is called agegroup, and looks like this:

    Code:
    codebook agegroup
    
    ----------------------------------------------------------------------------------------
    agegroup                         Age strata within the cohort
    ----------------------------------------------------------------------------------------
    
                      type:  numeric (float)
                     label:  agegroup_lbl
    
                     range:  [1,6]                        units:  1
             unique values:  6                        missing .:  0/9,163
    
                tabulation:  Freq.   Numeric  Label
                             1,113         1  20.0-24.9
                             1,733         2  25.0-29.9
                             1,303         3  30.0-34.9
                             1,504         4  35.0-39.9
                             1,742         5  40.0-44.9
                             1,768         6  45.0-49.9
    I also have a variable for age called PartAge:

    Code:
    . codebook PartAge
    
    ----------------------------------------------------------------------------------------
    PartAge                                                                     (unlabeled)
    ----------------------------------------------------------------------------------------
    
                      type:  numeric (float)
    
                     range:  [20,49.9]                    units:  .1
             unique values:  300                      missing .:  0/9,163
    
                      mean:   35.9571
                  std. dev:   8.55075
    
               percentiles:        10%       25%       50%       75%       90%
                                  24.3      28.4      36.5      43.4      47.4
    I tried using this code:

    Code:
    egen agegroup_mean = mean(PartAge), by(agegroup)
    (And the same thing for min and max, but would it be possible to do it in a more elegant way, displaying both mean, min and max in one table?

    Sigrid

  • #2
    the above code creates new variables; is that what you want? or, do you want to see the results? if so, use the -table- command:
    Code:
    help table
    example code might be:
    Code:
    table agegroup, c(mean PartAge min PartAge max PartAge
    you could get overall totals with this by adding the "row" option

    if you want the variables, then 3 statements as above certainly works but I wonder what you want to use them for?????

    Comment


    • #3
      Dear Rich,

      Thank you so much for the input. The last formula, adding "row", generated exactly what I wanted.

      Best regards,
      Sigrid

      Comment

      Working...
      X