Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping duplicate observations based on criteria

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int a01 byte mid
    1 1
    1 1
    1 2
    1 3
    1 4
    2 1
    2 2
    2 3
    3 1
    3 2
    3 3
    3 3
    3 4
    4 1
    4 1
    4 2
    4 2
    4 3
    4 4
    end
    Here, a01 stands for 'Household ID' while mid stands for 'Member ID'.

    I want to keep information of a member only once for one household. But there are duplicate member ID (mid) for one household. Therefore, I want to remove row 2,12,15,17 for example.

    I have a large dataset. I have curtailed the dataset to simplify the problem.

    Can you give me the necessary codes to get information of a member only once for one household?


  • #2
    This problem has been solved.


    * Generate a unique identifier for each household
    egen household_id = group(a01)

    * List duplicate rows within each household
    duplicates report household_id mid

    * Remove duplicate rows within each household
    duplicates drop household_id mid, force

    * Drop the household_id variable
    drop household_id

    list

    Comment

    Working...
    X