Average within a group for each unique combination

Ernestina delPiero

Join Date: May 2016

Posts: 62
#1

Average within a group for each unique combination

06 Oct 2019, 18:52

I would like to take a simple average within each group but only over each unique iso and urb_cl combination.

For example, in the table below (a sample of my dataset is provided at the end) the average I want for GEO=YRP and URB_CL=LOW is 6.3 (=(11.305+1.391)/2), rather than 8.0 (=(11.305+11.305+1.391)/3).

I would like to have the correct average as a new variable, knowing that in this case 6.3 will be the value for all instances when GEO=YRP and URB_CL=LOW.
geo iso urb_cl urb curr

YRP ASA low 11.305 4502.119

YRP ASA 11.305

YRP ASA low 11.305 7127.425

YRP KWW 0

YRP REQ low 1.391 8219.178

YRP TRS high 66.793 2925.037

I can get the averages I don't want using:

Code:

table geo urb_cl if !missing(curr), c(n urb mean urb ) format(%14.1fc) cellwidth(10)

...and I can get the averages I do want by running this same command after collapsing the dataset, i.e.

Code:

preserve collapse urb urb_cl if !missing( curr ), by( iso geo ) sort geo iso urb table geo urb_cl, c(n urb mean urb ) format(%14.1fc) cellwidth(10) restore

Is there a more straightforward and elegant way to achieve what I want?

------------------
My dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str3(geo iso) float urb_cl double urb float curr "PAC" "ASA" 2 82.892 1873.3757 "PAC" "ASA" 2 82.892 1252.2163 "PAC" "BGD" 2 68.775 3583.354 "PAC" "FJI" . 47.317 . "PAC" "KHM" 2 79.337 7045.824 "PAC" "VNM" 2 68.92 1321.158 "PAC" "VNM" 2 68.92 4199.222 "YRP" "ASA" 0 11.305 4502.1187 "YRP" "ASA" . 11.305 . "YRP" "ASA" 0 11.305 7127.425 "YRP" "KWW" . 0 . "YRP" "REQ" 0 1.391 8219.178 "YRP" "TRS" 2 66.793 2925.0366 end label values urb_cl label_n label def label_n 0 "low", modify label def label_n 2 "high", modify

Last edited by Ernestina delPiero; 06 Oct 2019, 18:57.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#2

06 Oct 2019, 22:00

I am completely confused by your post.

You say you want means of something (what, exactly? from what comes later I guess curr) for each combination of iso and urb_cl. But then you do a table disaggregating by geo and urb_cl. So it's not surprising that doesn't get you what you want. If you do -table iso urb_cl, c(mean urb)- do you get what you want?

You say you do get what you want by running some code that includes -collapse urb urb_cl...- That command makes no sense to me. urb_cl is a 1/2 categorical variable, and taking its mean will gives you other values between 1 and 2 for urb_cl. So I really don't understand what you are trying to get at here.

When I look at

I would like to take a simple average within each group but only over each unique iso and urb_cl combination.

For example, in the table below (a sample of my dataset is provided at the end) the average I want for GEO=YRP and URB_CL=LOW is 6.3 (=(11.305+1.391)/2), rather than 8.0 (=(11.305+11.305+1.391)/3).

I think I understand what you want to do here. But there is a big problem with this. This calculation is workable only because the two observations with YRP and low have the same value of curr, 11.305. So it is easy enough to say you only want to count it once. But what if you had two observations with YRP and low and they had different values of curr. What would you do then?
Comment
Ernestina delPiero

Join Date: May 2016

Posts: 62
#3

07 Oct 2019, 08:05

I did a terrible job at explaining what I want. So you're right to be confused, Clyde.I would edit my post but no longer can.

I am actually after the mean value of 'urb' over each combination - rather than each instance - of geo (not iso) and urb_cl.

I run the collapse command simply because doing it allows me to have a dataset on which I run the - table geo urb_cl, c(n urb mean urb ) - command that gets me the results I am after. I don't take the mean of urb_cl.

Hopefully this explains better what I am after.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30065

07 Oct 2019, 09:55

OK, this is better. I think what you want is this:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str3(geo iso) float urb_cl double urb float curr
"PAC" "ASA" 2 82.892 1873.3757
"PAC" "ASA" 2 82.892 1252.2163
"PAC" "BGD" 2 68.775 3583.354
"PAC" "FJI" . 47.317 .
"PAC" "KHM" 2 79.337 7045.824
"PAC" "VNM" 2 68.92 1321.158
"PAC" "VNM" 2 68.92 4199.222
"YRP" "ASA" 0 11.305 4502.1187
"YRP" "ASA" . 11.305 .
"YRP" "ASA" 0 11.305 7127.425
"YRP" "KWW" . 0 .
"YRP" "REQ" 0 1.391 8219.178
"YRP" "TRS" 2 66.793 2925.0366
end
label values urb_cl label_n
label def label_n 0 "low", modify
label def label_n 2 "high", modify

egen flag = tag(geo urb_cl urb) // SELECT ONE OCCURRENCE OF EACH VALUE OF URB
by geo urb_cl, sort: egen wanted_new_variable = mean(cond(flag, urb, .))

Comment

Ernestina delPiero

Join Date: May 2016

Posts: 62
#5

12 Oct 2019, 18:10

That is exactly what I wanted. Thank you.
Comment

geo	iso	urb_cl	urb	curr
YRP	ASA	low	11.305	4502.119
YRP	ASA		11.305
YRP	ASA	low	11.305	7127.425
YRP	KWW		0
YRP	REQ	low	1.391	8219.178
YRP	TRS	high	66.793	2925.037

Announcement

Average within a group for each unique combination

Comment

Comment

Comment

Comment