How to drop entire observations with duplicates for a particular variable keeping the original (or the first observation

anisha arya

Join Date: Jul 2024

Posts: 28
#1

How to drop entire observations with duplicates for a particular variable keeping the original (or the first observation

31 Jul 2024, 13:17

I have data from a survey where some participants opened the survey twice, thrice or more. How do I drop the data for the second and onwards entries regardless of whether they completed the survey or not at a go for any entry. I want to only keep the first observation. These participants are identified by the ID number, var(id). So
id surveyprogress%

1 23

1 12

1 56

2 13

In this case, I would want to drop the third and fourth rows because the share the same ID number as row 1.

Please let me know, thank so much in advance!

Last edited by anisha arya; 31 Jul 2024, 13:24.
Tags: drop, duplicates, observation
Clyde Schechter

Join Date: Apr 2014

Posts: 30063
#2

31 Jul 2024, 13:23

Assuming the dataset is already sorted in the correct order so that the chronologically first observation is also the physically first one, it's easy:

Code:

sort id, stable by id: keep if _n == 1
Comment

Announcement