Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Top of the observations

    Hello, I need to create a top 20 of my observations, I try to use: for example
    tab x y, but there are too many values

    How can I resolve this?
    Thanks a lot

  • #2
    Define top 20: with respect to

    the current sort order?

    highest x?

    lowest x? (sort to top of dataset!)

    same question for y?

    something else?

    Comment


    • #3
      Code:
      gsort -x 
      tab x y in 1/20
      If there are missings they will get sorted to the bottom.

      Comment


      • #4
        Probably even better

        Code:
        gsort -x
        
        tab x if _n<.2*_N
        so that we do not have to calculate by hand what fraction is 20% of the total, e.g. here, 20 is 20% of 100, but not 20% of 900000.

        Another way would be

        Code:
        xtile quintiles = x, nq(5)
        
        tab x if quintiles==5

        Comment


        • #5
          Top 20 values or values that are in the top 20%? That % sign came out of nowhere.

          Comment


          • #6
            Originally posted by Nick Cox View Post
            Top 20 values or values that are in the top 20%? That % sign came out of nowhere.
            Indeed, this is in principle a trivial question if posed precisely. E.g., "I need to flag the top 20 observations of x" vs "I need to flag the top 20% of the observations of x"...

            But the way the question is posed now, it mostly tests the the intuition of the reader, i.e., it tests whether the force is strong with the reader

            Comment


            • #7
              I smuggled a Star Wars allusion into the Stata Journal.

              https://www.stata-journal.com/sjpdf....iclenum=pr0022

              After thinking up this small and corny joke, I Googled and found it already in many places. Oh well....

              Comment


              • #8
                I would take a look at Stata's xtile, pctile, and centile commands.

                You might also take a look at the following posts on Statalist:
                • How to calculate top(bottom) deciles, link
                • Top 20% based on existing data, link
                • How to create a database that individuals that belong to the 1st percentile, link
                Here are some examples using toy data:
                Code:
                dataex firm sales assets employees  // Data shared via -dataex-. To install: ssc install dataex
                clear
                input byte firm int(sales assets employees)
                 1 480 422 188
                 2  90  34 177
                 3 324 482  30
                 4 183 420  48
                 5 313 320  95
                 6 453 266 168
                 7 148 107 215
                 8 241 482  90
                 9  76 302 155
                10  23 335 164
                11 481 314 121
                12 136 359  89
                13 317 327 211
                14 253  26  75
                15 259  87  95
                16 210   8  17
                17 219 234  39
                18 223  22 163
                19 354 414  43
                20 176 283  15
                21 313 414  23
                22 244 272 211
                23 311 315 160
                24 135 440  14
                25  78 403 219
                26 401 250  99
                27 251 483 188
                28 158 284 108
                29 347 497  90
                30 360 452 207
                31 444 451  95
                32  45 467 121
                33  62 412  57
                34 242 412   8
                35 243  97  63
                36 257 311 154
                37 151  89 117
                38 124 237 109
                39 169 480 154
                40 283 273  67
                end
                ------------------ copy up to and including the previous line ------------------
                
                
                tabstat sales assets employees, stats(mean median min max p90 p95 p99) col(stats)
                
                    variable |      mean       p50       min       max       p90       p95       p99
                -------------+----------------------------------------------------------------------
                       sales |   239.425     242.5        23       481     422.5     466.5       481
                      assets |   307.075     317.5         8       497       481     482.5       497
                   employees |    111.55     103.5         8       219       209       213       219
                ------------------------------------------------------------------------------------
                
                gsort -sales
                egen rank_sales = rank(sales), field  // field puts highest==1, 2nd==2, and so on
                gen top_20 = (rank_sales <=20)
                centile sales, centile(10(10)90)
                
                
                                                                       -- Binom. Interp. --
                    Variable |       Obs  Percentile    Centile        [95% Conf. Interval]
                -------------+-------------------------------------------------------------
                       sales |        40         10        76.2         26.4223    135.6401
                             |                   20       135.2        73.63794    174.3575
                             |                   30       161.3        124.4324    230.2855
                             |                   40       213.6        149.4393    248.8958
                             |                   50       242.5        177.9108    276.4489
                             |                   60       255.4        228.4107    315.0809
                             |                   70       312.4        248.1668    353.7248
                             |                   80       342.4        289.5701    445.5185
                             |                   90       439.7        332.2785    480.8444
                
                xtile sales_decile = sales, nq(10)  // split sales into deciles
                list firm sales rank_sales top_20 sales_decile, sepby(sales_decile) noobs abbrev(12)
                * Note the tie at 11th place, where two firms have sales==313 
                
                  +---------------------------------------------------+
                  | firm   sales   rank_sales   top_20   sales_decile |
                  |---------------------------------------------------|
                  |   11     481            1        1             10 |
                  |    1     480            2        1             10 |
                  |    6     453            3        1             10 |
                  |   31     444            4        1             10 |
                  |---------------------------------------------------|
                  |   26     401            5        1              9 |
                  |   30     360            6        1              9 |
                  |   19     354            7        1              9 |
                  |   29     347            8        1              9 |
                  |---------------------------------------------------|
                  |    3     324            9        1              8 |
                  |   13     317           10        1              8 |
                  |    5     313           11        1              8 |
                  |   21     313           11        1              8 |
                  |---------------------------------------------------|
                  |   23     311           13        1              7 |
                  |   40     283           14        1              7 |
                  |   15     259           15        1              7 |
                  |   36     257           16        1              7 |
                  |---------------------------------------------------|
                  |   14     253           17        1              6 |
                  |   27     251           18        1              6 |
                  |   22     244           19        1              6 |
                  |   35     243           20        1              6 |
                  |---------------------------------------------------|
                  |   34     242           21        0              5 |
                  |    8     241           22        0              5 |
                  |   18     223           23        0              5 |
                  |   17     219           24        0              5 |
                  |---------------------------------------------------|
                  |   16     210           25        0              4 |
                  |    4     183           26        0              4 |
                  |   20     176           27        0              4 |
                  |   39     169           28        0              4 |
                  |---------------------------------------------------|
                  |   28     158           29        0              3 |
                  |   37     151           30        0              3 |
                  |    7     148           31        0              3 |
                  |   12     136           32        0              3 |
                  |---------------------------------------------------|
                  |   24     135           33        0              2 |
                  |   38     124           34        0              2 |
                  |    2      90           35        0              2 |
                  |   25      78           36        0              2 |
                  |---------------------------------------------------|
                  |    9      76           37        0              1 |
                  |   33      62           38        0              1 |
                  |   32      45           39        0              1 |
                  |   10      23           40        0              1 |
                  +---------------------------------------------------+
                .
                Last edited by David Benson; 16 Jan 2019, 15:00.

                Comment

                Working...
                X