removing duplicates based on a criteria

Khushbakht Jamal

Join Date: Dec 2022

Posts: 1
#1

removing duplicates based on a criteria

07 Dec 2022, 22:41

Hello, I have a dataset with village names and their respective populations. Villages belong to different districts. The data structure looks like below:
Village_name District pop

shahpur A 10

ali pur B 20

chak 15 B 30

chak 15 B 40

chak 20 C 50

I want to get a unique dataset by dropping the duplicate villages. But while dropping, I want to keep the village name with higher population, in the example above I want to keep chak 15 with 40 people and not 30.

How can I do so? Please let me know.

Thanks.
Tags: None
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1376
#2

07 Dec 2022, 22:50

Welcome to Statalist, Khushbakht!

Something like this should do it:

Code:

bysort District Village_name (pop): keep if _n == _N

warning: this is untested code, since a data example was not properly posted. Please go through Statalist FAQ, especially section 12, to see how data examples should be posted (using the dataex command in Stata).
1 like
Comment

Announcement