New Variable for Average Value of First Half of Observations

Hans Schubert

Join Date: Jul 2021

Posts: 109
#1

New Variable for Average Value of First Half of Observations

20 Aug 2021, 02:14

Good morning,

I would like to generate a newvar that returns the average of the 'rating' variable, for the first half of the 'run_no'

So in the example provided the 'run_no' goes in ascending order of date from 1-7.
The first half of the run_no is therefore 3.5, so in this case with an odd number, the first half of the run_no can be rounded up to the first 4 'run_no' values.
Therefore in this case I need the average 'rating' for the first half of the 'run_no' which is the average of 76-75-70-65, giving an average of 71.5 which can be rounded down to 71.

If there were for example 36 'run_no', then the average 'rating' for the first 18 of these would be required etc...

Thank you.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str9 ddate byte(run_no rating) "01-Aug-21" 7 98 "26-Sep-20" 6 80 "16-Aug-20" 5 93 "31-Jul-20" 4 76 "21-Jul-20" 3 75 "30-Jun-20" 2 70 "12-Jun-20" 1 65 end
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35711
#2

20 Aug 2021, 02:59

In your case you can just go

Code:

su rating if run_no <= ceil(run_no/_N) , meanonly

and look for r(mean) and round down. But presumably you would not be asking this without some other grouping variable, something like

Code:

egen count = count(run), by(group) egen wanted = mean(cond(run <= ceil(count/2), run_no, .)), by(group) replace wanted = floor(wanted)

This code populates all observations with the new means. If you want to populate only the observations for which they are calculated the code could be

Code:

egen wanted = mean(run_no) if run <= ceil(count/2), by(group)

See also https://www.stata-journal.com/articl...article=dm0055 -- especially Sections 9 and 10;
Comment
Hans Schubert

Join Date: Jul 2021

Posts: 109
#3

20 Aug 2021, 03:15

Thanks very much Nick. I will take a look and provide feedback later today.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35711
#4

20 Aug 2021, 04:03

First line of code should be

Code:

su rating if run_no <= ceil(_N/2) , meanonly
Comment

Hans Schubert

Join Date: Jul 2021
Posts: 109

21 Aug 2021, 00:44

Hi Nick

Your correct, there is a grouping variable, I've included it here as var4.
Your code required one small modification in line 3 (run_no changed to rating) and is working as I need it now.

Many thanks for your help

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 ddate byte(run_no rating) str8 var4 float(count wanted)
"01-Aug-21" 7 98 "Paco Boy" 7 71
"26-Sep-20" 6 80 "Paco Boy" 7 71
"16-Aug-20" 5 93 "Paco Boy" 7 71
"31-Jul-20" 4 76 "Paco Boy" 7 71
"21-Jul-20" 3 75 "Paco Boy" 7 71
"30-Jun-20" 2 70 "Paco Boy" 7 71
"12-Jun-20" 1 65 "Paco Boy" 7 71
end

Code:

su rating if run_no <= ceil(_N/2) , meanonly
egen count = count(run_no), by(var4)
egen wanted = mean(cond(run_no <= ceil(count/2), rating, .)), by(var4)
replace wanted = floor(wanted)

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35711
#6

21 Aug 2021, 03:30

You are correct. Sorry about the silly error.
Comment
Hans Schubert

Join Date: Jul 2021

Posts: 109
#7

08 Sep 2021, 16:47

Good evening,

I wanted to make a small modification to this code...
At the moment the wanted column is returning the average of the first half of the entire 7 runs in all its cells...
In the example wanted returns 71 as the run_no is divided by 2 to give 3.5 then rounded up to 4. The average of the first 4 ratings are then returned and rounded down to 71 in the wanted column

What I would like to do now is make the wanted column cumulative....for example run_no 7 takes the average rating of the first half of previous runs and returns them in wanted 1 (eg run_no - 1 = 6 runs, so average rating of run_no 1/2/3 is returned in wanted 1 (average(75+70+65 = 70)
I hope that makes sense...

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str9 ddate byte(run_no rating) str8 var4 float(count wanted var7) "01-Aug-21" 7 98 "Paco Boy" 7 71 70 "26-Sep-20" 6 80 "Paco Boy" 7 71 70 "16-Aug-20" 5 93 "Paco Boy" 7 71 67.5 "31-Jul-20" 4 76 "Paco Boy" 7 71 67.5 "21-Jul-20" 3 75 "Paco Boy" 7 71 67.5 "30-Jun-20" 2 70 "Paco Boy" 7 71 65 "12-Jun-20" 1 65 "Paco Boy" 7 71 . end
Comment

Hans Schubert

Join Date: Jul 2021
Posts: 109

09 Sep 2021, 07:19

Good afternoon,

Perhaps this explanation will be more coherent....

Top Row: run_no1 =4
Wanted: returns the average rating for all run_no <=4

in this example --- average of 76/75/70/65 = 71.5

2nd row: wanted returns average rating for run_no 1/2/3 = 70
and so on down through the dataset

Thanks

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str9 ddate byte(run_no rating) str8 var4 float(run_no1 wanted)
"01-Aug-21" 7 98 "Paco Boy" 4 71.5
"26-Sep-20" 6 80 "Paco Boy" 3   70
"16-Aug-20" 5 93 "Paco Boy" 3   70
"31-Jul-20" 4 76 "Paco Boy" 2 67.5
"21-Jul-20" 3 75 "Paco Boy" 2 67.5
"30-Jun-20" 2 70 "Paco Boy" 1   65
"12-Jun-20" 1 65 "Paco Boy" 1    .
end

Announcement