Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • second highest value in tabstat

    Hi,
    i need to export many summary tables to excel. In these summary tables I need to have the highest and second highest value of a variable grouped by another variable.
    For example:

    sysuse auto
    tabstat price, statistics(mean sd p25 p50 p75 p90 count max min) by(foreign)

    Unfortunately, there is no "statistic" in the tabstat option to specify that I need not only the max but also the second highest value.

    I know that there are ados like "extremes" but I really need summary statistics (mean, sd, quantiles and the two highest values) in one table (because I need to export >100 tables in total).

    Do you have any ideas how I can get a table such as with tabstat and the by option plus the two highest values?

    Thanks a lot in advance!
    Last edited by Susan Meier; 10 Nov 2022, 02:23.

  • #2
    If this was my problem I would do something like this. You get a reduced dataset, which you can export. The larger picture is to use frames or preserve and restore. in a loop over your reports.

    Code:
    . sysuse auto, clear 
    (1978 automobile data)
    
    . 
    . gen negprice = -price
    
    . bysort foreign (negprice) : gen second = -negprice[2]
    
    . 
    . collapse (max) max=price second=second (mean) mean=price (sd) sd=price, by(foreign)
    
    . 
    . list 
    
         +------------------------------------------------+
         |  foreign      max   second      mean        sd |
         |------------------------------------------------|
      1. | Domestic   15,906    14500   6,072.4   3,097.1 |
      2. |  Foreign   12,990    11995   6,384.7   2,621.9 |
         +------------------------------------------------+
    Two points, one more obvious than the other.

    1. Once the second largest value is a groupwise constant in a variable for a set of observations, there are several equally good ways to select it, as the mean or as any quantile, say.

    2. Sorting from smallest to largest sounds right, but then the second from the end will be a missing value if there are two or more missing values in any group. There are many work-arounds for that, such as segregating missing values. I negated values, found the second smallest, and then negated back. This will return missing if the number of non-missing values in a group is 0 or 1, which seems fair enough.

    Code:
    extremes
    is from SSC. You are absolutely right if you guessed that exporting to MS Excel was not a goal of its author.


    Comment


    • #3
      Here is another approach, using a custom Mata function and rangestat from SSC. You also need moremata from SSC.

      Code:
      webuse grunfeld, clear 
      
      mata : 
      
      mata clear
      
      real rowvector mystats(real colvector X) {
          X = select(X, (X :< .))
          _sort(X, 1)
          return(length(X), mean(X), sqrt(variance(X)), min(X), mm_quantile(X, 1, (0.25, 0.5, 0.75, 0.90)), X[length(X) - 1], max(X))
      } 
      
      end 
      
      rangestat (mystats) invest, interval(company 0 0) 
      
      rename (mystats*) (count mean sd min p25 p50 p75 p90 second max)
      
      tabdisp company, c(count mean sd min p25)
      tabdisp company, c(p50 p75 p90 second max)
      
      su invest if company == 1, detail
      To get a dataset to export, you just go

      Code:
      preserve 
      egen tag = tag(company) 
      keep if tag 
      keep company count-max 
      * export as desired 
      restore

      Comment

      Working...
      X