Choosing the highest value of a variable in each category corresponding to another variable

Oswa Shafei

Join Date: Nov 2019

Posts: 1
#1

Choosing the highest value of a variable in each category corresponding to another variable

08 Nov 2019, 17:30

Hello all,
I have Var1, which is postal codes, and Var2, dissemination area codes, for which there are multiple dissemination areas assigned to each postal code. I also have Var 3, which is the total population of each dissemination area.

I would like to choose and only keep the most populous dissemination area for each postal code. In other words, I'd like to remove repeated values in my postal code variable, keeping only the postal code which corresponds to the most populous dissemination area.

Note also that some dissemination areas are also repeated among different postal codes; the same dissemination area may be found to be the most populous for multiple postal codes.

Any help greatly appreciated, thank you.
Tags: choose, data, highest value, merge, repeated values
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

08 Nov 2019, 20:19

Irt sounds like you want:

Code:

by Var1 (Var3), sort: keep if _n == _N

Note: this assumes that Var3 never has missing values. If that's not the case in your data, first -drop if missing(Var3)-.
Comment

Announcement