Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Choosing the highest value of a variable in each category corresponding to another variable

    Hello all,
    I have Var1, which is postal codes, and Var2, dissemination area codes, for which there are multiple dissemination areas assigned to each postal code. I also have Var 3, which is the total population of each dissemination area.

    I would like to choose and only keep the most populous dissemination area for each postal code. In other words, I'd like to remove repeated values in my postal code variable, keeping only the postal code which corresponds to the most populous dissemination area.

    Note also that some dissemination areas are also repeated among different postal codes; the same dissemination area may be found to be the most populous for multiple postal codes.

    Any help greatly appreciated, thank you.

  • #2
    Irt sounds like you want:
    Code:
    by Var1 (Var3), sort: keep if _n == _N
    Note: this assumes that Var3 never has missing values. If that's not the case in your data, first -drop if missing(Var3)-.

    Comment

    Working...
    X