Dropping duplicate observations based on criteria

Md Anik Islam

Join Date: Jul 2022

Posts: 8
#1

Dropping duplicate observations based on criteria

21 Oct 2023, 06:49

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int a01 byte mid 1 1 1 1 1 2 1 3 1 4 2 1 2 2 2 3 3 1 3 2 3 3 3 3 3 4 4 1 4 1 4 2 4 2 4 3 4 4 end

Here, a01 stands for 'Household ID' while mid stands for 'Member ID'.

I want to keep information of a member only once for one household. But there are duplicate member ID (mid) for one household. Therefore, I want to remove row 2,12,15,17 for example.

I have a large dataset. I have curtailed the dataset to simplify the problem.

Can you give me the necessary codes to get information of a member only once for one household?
Tags: None
Md Anik Islam

Join Date: Jul 2022

Posts: 8
#2

21 Oct 2023, 07:12

This problem has been solved.

* Generate a unique identifier for each household
egen household_id = group(a01)

* List duplicate rows within each household
duplicates report household_id mid

* Remove duplicate rows within each household
duplicates drop household_id mid, force

* Drop the household_id variable
drop household_id

list
Comment

Announcement