Average of Top and Bottom Half of Group

Hans Schubert

Join Date: Jul 2021

Posts: 109
#1

Average of Top and Bottom Half of Group

02 Jun 2022, 04:42

Hello,

Would appreciate some assistance on a calculation for 2 new variables please.

By the group var, I need to get the average value of the top half ranked and bottom half ranked of the value var, both returned in 2 new vars in columns 4 and 5.

The problem I'm having is when there is an odd number of cases in the group such as in this example, which has 11.

I would like to add the values for the top 5 ranked and half the value of the 6th ranked (as it is in the middle) eg. 23+45+56+2+4+46.5 (half the value of 93 (rank 6), then get the average of this number by dividing by half the cases 11 / 2 = 5.5, therefore giving an average top half value of 32.09, returned in a new var in the column next door. Same calculation applied to the bottom half eg. (75+36+12+52+19+46.5) / 5.5 = 43.73

Thanks,
Hans

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str6 groupvar byte(rank value) float(average_of_top_half_rank average_of_bottom_half_rank) "Friday" 1 23 32.09 43.73 "Friday" 2 45 32.09 43.73 "Friday" 3 56 32.09 43.73 "Friday" 4 2 32.09 43.73 "Friday" 5 4 32.09 43.73 "Friday" 6 93 32.09 43.73 "Friday" 7 75 32.09 43.73 "Friday" 8 36 32.09 43.73 "Friday" 9 12 32.09 43.73 "Friday" 10 52 32.09 43.73 "Friday" 11 19 32.09 43.73 end
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35694

02 Jun 2022, 05:06

There should be an easier way to do this.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str6 groupvar byte(rank value) float(average_of_top_half_rank average_of_bottom_half_rank)
"Friday"  1 23 32.09 43.73
"Friday"  2 45 32.09 43.73
"Friday"  3 56 32.09 43.73
"Friday"  4  2 32.09 43.73
"Friday"  5  4 32.09 43.73
"Friday"  6 93 32.09 43.73
"Friday"  7 75 32.09 43.73
"Friday"  8 36 32.09 43.73
"Friday"  9 12 32.09 43.73
"Friday" 10 52 32.09 43.73
"Friday" 11 19 32.09 43.73
"Saturday" 1 42 . . 
"Saturday" 2 66 . . 
end 

gen ismissing = missing(value)

bysort ismissing groupvar (rank) : gen islower = _n <= _N/2 if mod(_N, 2) == 0 
by ismissing groupvar : replace islower = cond(_n < _N/2, 1, cond(_n == (_N + 1)/2, 0.5, 0)) if mod(_N, 2) == 1 
gen ishigher = 1 - islower 

by ismissing groupvar : egen lowernum = total(islower * value)
by ismissing groupvar : egen lowerden = total(islower)
by ismissing groupvar : egen highernum = total(ishigher * value)
by ismissing groupvar : egen higherden = total(ishigher)
gen wanted1 = lowernum/lowerden
gen wanted2 = highernum/higherden 

list, sepby(groupvar)

Comment

Hans Schubert

Join Date: Jul 2021

Posts: 109
#3

02 Jun 2022, 05:41

Thanks very much, Nick.
Comment
Hans Schubert

Join Date: Jul 2021

Posts: 109
#4

17 Sep 2022, 15:20

Good evening,

Could this code be modified to produce the same numbers but on a rolling basis, to not include the current row.

For example row 5 would produce the average of row 1 and 2 (23+45) against the average of row 3 and 4 (56+2) and so on...

Thank you.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35694

18 Sep 2022, 00:50

I am not clear on what you want here -- what does "against" mean, precisely? But you can average the previous 2, the next 2, the previous 2 and the next 2, and so forth.

Here I fake data with a time variable, but it could be rank or any other ordering. I used rangestat from SSC. As the example shows, you need technique for incomplete windows, even if you ignore such results.

Code:

clear 
set obs 20 
gen group = 1 + (_n > 10)
bysort group : gen time = _n
gen y = _n 

list, sepby(group)


rangestat (count) p2c=y (mean) prev2=y, int(time -2 -1) by(group)
rangestat (count) n2c=y (mean) next2=y, int(time 1 2) by(group)
rangestat (count) p2n2c=y (mean) prev2next2=y, int(time -2 2) by(group) excludeself 

list, sepby(group) abbrev(10)

     +--------------------------------------------------------------------+
     | group   time    y   p2c   prev2   n2c   next2   p2n2c   prev2next2 |
     |--------------------------------------------------------------------|
  1. |     1      1    1     .       .     2     2.5       2          2.5 |
  2. |     1      2    2     1       1     2     3.5       3    2.6666667 |
  3. |     1      3    3     2     1.5     2     4.5       4            3 |
  4. |     1      4    4     2     2.5     2     5.5       4            4 |
  5. |     1      5    5     2     3.5     2     6.5       4            5 |
  6. |     1      6    6     2     4.5     2     7.5       4            6 |
  7. |     1      7    7     2     5.5     2     8.5       4            7 |
  8. |     1      8    8     2     6.5     2     9.5       4            8 |
  9. |     1      9    9     2     7.5     1      10       3    8.3333333 |
 10. |     1     10   10     2     8.5     .       .       2          8.5 |
     |--------------------------------------------------------------------|
 11. |     2      1   11     .       .     2    12.5       2         12.5 |
 12. |     2      2   12     1      11     2    13.5       3    12.666667 |
 13. |     2      3   13     2    11.5     2    14.5       4           13 |
 14. |     2      4   14     2    12.5     2    15.5       4           14 |
 15. |     2      5   15     2    13.5     2    16.5       4           15 |
 16. |     2      6   16     2    14.5     2    17.5       4           16 |
 17. |     2      7   17     2    15.5     2    18.5       4           17 |
 18. |     2      8   18     2    16.5     2    19.5       4           18 |
 19. |     2      9   19     2    17.5     1      20       3    18.333333 |
 20. |     2     10   20     2    18.5     .       .       2         18.5 |
     +--------------------------------------------------------------------+

Comment

Hans Schubert

Join Date: Jul 2021

Posts: 109
#6

18 Sep 2022, 02:20

Good morning Nick,

Your solution in #2 did the trick calculating what was needed for the entire sample. Looking back at it now, it needs to do the same but on a rolling basis. I have attached some more data here, with a more practical example of what is required. Apologies for #4, my requirements were admittedly not very clear.

So I start with column 1-3 and use

bysort horse (ddate): gen run_no = _n

...to generate the run_no column

In the example VAR 'wanted_meanlowerhalf' generates the average of the 'performancerating' for horse 'Zone In' in the lower half of its previous runs (excluding the present row), so its had 4 previous runs, the average performance rating of its first two runs is (63+38)/2 = 50.5

Similarly for the 'wanted_meanupperhalf' column. The average performancerating for the second half of its previous run is (63+46)/2 = 54.5

Its get a little more complex if the number of previous runs is an odd number. In that scenario, for example if the number of previous runs is 5, then the average performancerating for the first 2.5 runs is returned (63+38+31.5)/2.5 = 53, and the last 2.5 runs (46+65+31.5)/2.5 = 57

If a horse had 10 previous runs, wanted columns would calculate the average performance rating for its first five runs, and its second five runs.

Basically your solution in #2 does all this, but for the group as a whole, not on a rolling basis, which is what I am looking for here.

I hope that is clear enough, thanks.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str7 horse float ddate byte(performancerating run_no) float(wanted_meanlowerhalf wanted_meanupperhalf) "Zone In" 20939 63 1 . . "Zone In" 20951 38 2 . . "Zone In" 20955 63 3 . . "Zone In" 20979 46 4 . . "Zone In" 21000 65 5 50.5 54.5 "Zone In" 21011 42 6 53 57 "Zone In" 21048 12 7 . . "Zone In" 21249 35 8 . . "Zonik" 22516 19 1 . . "Zonik" 22575 64 2 . . "Zonik" 22773 75 3 . . "Zonik" 22798 56 4 . . "Zonik" 22827 48 5 . . "Zonik" 22838 61 6 . . "Zonik" 22855 56 7 . . "Zonik" 22888 38 8 . . end format %td ddate
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35694
#7

18 Sep 2022, 06:19

Evidently you need something more complicated, but I can't quickly suggest what that is.
Comment

Announcement

Average of Top and Bottom Half of Group

Comment

Comment

Comment

Comment

Comment

Comment