Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • removing duplicates based on a criteria

    Hello, I have a dataset with village names and their respective populations. Villages belong to different districts. The data structure looks like below:
    Village_name District pop
    shahpur A 10
    ali pur B 20
    chak 15 B 30
    chak 15 B 40
    chak 20 C 50
    I want to get a unique dataset by dropping the duplicate villages. But while dropping, I want to keep the village name with higher population, in the example above I want to keep chak 15 with 40 people and not 30.

    How can I do so? Please let me know.

    Thanks.

  • #2
    Welcome to Statalist, Khushbakht!

    Something like this should do it:
    Code:
    bysort District Village_name (pop): keep if _n == _N
    warning: this is untested code, since a data example was not properly posted. Please go through Statalist FAQ, especially section 12, to see how data examples should be posted (using the dataex command in Stata).

    Comment

    Working...
    X