Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Average of Top and Bottom Half of Group

    Hello,

    Would appreciate some assistance on a calculation for 2 new variables please.

    By the group var, I need to get the average value of the top half ranked and bottom half ranked of the value var, both returned in 2 new vars in columns 4 and 5.

    The problem I'm having is when there is an odd number of cases in the group such as in this example, which has 11.

    I would like to add the values for the top 5 ranked and half the value of the 6th ranked (as it is in the middle) eg. 23+45+56+2+4+46.5 (half the value of 93 (rank 6), then get the average of this number by dividing by half the cases 11 / 2 = 5.5, therefore giving an average top half value of 32.09, returned in a new var in the column next door. Same calculation applied to the bottom half eg. (75+36+12+52+19+46.5) / 5.5 = 43.73

    Thanks,
    Hans

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 groupvar byte(rank value) float(average_of_top_half_rank average_of_bottom_half_rank)
    "Friday"  1 23 32.09 43.73
    "Friday"  2 45 32.09 43.73
    "Friday"  3 56 32.09 43.73
    "Friday"  4  2 32.09 43.73
    "Friday"  5  4 32.09 43.73
    "Friday"  6 93 32.09 43.73
    "Friday"  7 75 32.09 43.73
    "Friday"  8 36 32.09 43.73
    "Friday"  9 12 32.09 43.73
    "Friday" 10 52 32.09 43.73
    "Friday" 11 19 32.09 43.73
    end


  • #2
    There should be an easier way to do this.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str6 groupvar byte(rank value) float(average_of_top_half_rank average_of_bottom_half_rank)
    "Friday"  1 23 32.09 43.73
    "Friday"  2 45 32.09 43.73
    "Friday"  3 56 32.09 43.73
    "Friday"  4  2 32.09 43.73
    "Friday"  5  4 32.09 43.73
    "Friday"  6 93 32.09 43.73
    "Friday"  7 75 32.09 43.73
    "Friday"  8 36 32.09 43.73
    "Friday"  9 12 32.09 43.73
    "Friday" 10 52 32.09 43.73
    "Friday" 11 19 32.09 43.73
    "Saturday" 1 42 . . 
    "Saturday" 2 66 . . 
    end 
    
    gen ismissing = missing(value)
    
    bysort ismissing groupvar (rank) : gen islower = _n <= _N/2 if mod(_N, 2) == 0 
    by ismissing groupvar : replace islower = cond(_n < _N/2, 1, cond(_n == (_N + 1)/2, 0.5, 0)) if mod(_N, 2) == 1 
    gen ishigher = 1 - islower 
    
    by ismissing groupvar : egen lowernum = total(islower * value)
    by ismissing groupvar : egen lowerden = total(islower)
    by ismissing groupvar : egen highernum = total(ishigher * value)
    by ismissing groupvar : egen higherden = total(ishigher)
    gen wanted1 = lowernum/lowerden
    gen wanted2 = highernum/higherden 
    
    list, sepby(groupvar)

    Comment


    • #3
      Thanks very much, Nick.

      Comment


      • #4
        Good evening,

        Could this code be modified to produce the same numbers but on a rolling basis, to not include the current row.

        For example row 5 would produce the average of row 1 and 2 (23+45) against the average of row 3 and 4 (56+2) and so on...

        Thank you.

        Comment


        • #5
          I am not clear on what you want here -- what does "against" mean, precisely? But you can average the previous 2, the next 2, the previous 2 and the next 2, and so forth.

          Here I fake data with a time variable, but it could be rank or any other ordering. I used rangestat from SSC. As the example shows, you need technique for incomplete windows, even if you ignore such results.


          Code:
          clear 
          set obs 20 
          gen group = 1 + (_n > 10)
          bysort group : gen time = _n
          gen y = _n 
          
          list, sepby(group)
          
          
          rangestat (count) p2c=y (mean) prev2=y, int(time -2 -1) by(group)
          rangestat (count) n2c=y (mean) next2=y, int(time 1 2) by(group)
          rangestat (count) p2n2c=y (mean) prev2next2=y, int(time -2 2) by(group) excludeself 
          
          list, sepby(group) abbrev(10)
          
               +--------------------------------------------------------------------+
               | group   time    y   p2c   prev2   n2c   next2   p2n2c   prev2next2 |
               |--------------------------------------------------------------------|
            1. |     1      1    1     .       .     2     2.5       2          2.5 |
            2. |     1      2    2     1       1     2     3.5       3    2.6666667 |
            3. |     1      3    3     2     1.5     2     4.5       4            3 |
            4. |     1      4    4     2     2.5     2     5.5       4            4 |
            5. |     1      5    5     2     3.5     2     6.5       4            5 |
            6. |     1      6    6     2     4.5     2     7.5       4            6 |
            7. |     1      7    7     2     5.5     2     8.5       4            7 |
            8. |     1      8    8     2     6.5     2     9.5       4            8 |
            9. |     1      9    9     2     7.5     1      10       3    8.3333333 |
           10. |     1     10   10     2     8.5     .       .       2          8.5 |
               |--------------------------------------------------------------------|
           11. |     2      1   11     .       .     2    12.5       2         12.5 |
           12. |     2      2   12     1      11     2    13.5       3    12.666667 |
           13. |     2      3   13     2    11.5     2    14.5       4           13 |
           14. |     2      4   14     2    12.5     2    15.5       4           14 |
           15. |     2      5   15     2    13.5     2    16.5       4           15 |
           16. |     2      6   16     2    14.5     2    17.5       4           16 |
           17. |     2      7   17     2    15.5     2    18.5       4           17 |
           18. |     2      8   18     2    16.5     2    19.5       4           18 |
           19. |     2      9   19     2    17.5     1      20       3    18.333333 |
           20. |     2     10   20     2    18.5     .       .       2         18.5 |
               +--------------------------------------------------------------------+

          Comment


          • #6
            Good morning Nick,

            Your solution in #2 did the trick calculating what was needed for the entire sample. Looking back at it now, it needs to do the same but on a rolling basis. I have attached some more data here, with a more practical example of what is required. Apologies for #4, my requirements were admittedly not very clear.

            So I start with column 1-3 and use

            bysort horse (ddate): gen run_no = _n

            ...to generate the run_no column

            In the example VAR 'wanted_meanlowerhalf' generates the average of the 'performancerating' for horse 'Zone In' in the lower half of its previous runs (excluding the present row), so its had 4 previous runs, the average performance rating of its first two runs is (63+38)/2 = 50.5

            Similarly for the 'wanted_meanupperhalf' column. The average performancerating for the second half of its previous run is (63+46)/2 = 54.5

            Its get a little more complex if the number of previous runs is an odd number. In that scenario, for example if the number of previous runs is 5, then the average performancerating for the first 2.5 runs is returned (63+38+31.5)/2.5 = 53, and the last 2.5 runs (46+65+31.5)/2.5 = 57

            If a horse had 10 previous runs, wanted columns would calculate the average performance rating for its first five runs, and its second five runs.

            Basically your solution in #2 does all this, but for the group as a whole, not on a rolling basis, which is what I am looking for here.

            I hope that is clear enough, thanks.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str7 horse float ddate byte(performancerating run_no) float(wanted_meanlowerhalf wanted_meanupperhalf)
            "Zone In" 20939 63 1    .    .
            "Zone In" 20951 38 2    .    .
            "Zone In" 20955 63 3    .    .
            "Zone In" 20979 46 4    .    .
            "Zone In" 21000 65 5 50.5 54.5
            "Zone In" 21011 42 6   53   57
            "Zone In" 21048 12 7    .    .
            "Zone In" 21249 35 8    .    .
            "Zonik"   22516 19 1    .    .
            "Zonik"   22575 64 2    .    .
            "Zonik"   22773 75 3    .    .
            "Zonik"   22798 56 4    .    .
            "Zonik"   22827 48 5    .    .
            "Zonik"   22838 61 6    .    .
            "Zonik"   22855 56 7    .    .
            "Zonik"   22888 38 8    .    .
            end
            format %td ddate

            Comment


            • #7
              Evidently you need something more complicated, but I can't quickly suggest what that is.

              Comment

              Working...
              X