Dropping duplicates by date

Jonathan Marin

Join Date: May 2017

Posts: 14
#1

Dropping duplicates by date

20 Jun 2017, 04:14

Hello!

I have a data set with appointments at a dental clinic over a period of 2 months (1st of may to 1st of july). Some patients attend the clinic more than once during this period and thus occur several times in the data set. For each patient occuring more than once, I want to keep the first appointment and drop all other appointments of that patient. If a patient attends the clinic the 1st of may, the 2nd of may and the 3rd of may i want to drop the two latter observations and keep the first observation.

Using:

duplicates drop patient_id, force

will drop all duplicates. But i cannot see if it drops them in the manner i want?

If i sort the observations by date and execute the above command, will I achieve the desired results?

best regards,
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17704
#2

20 Jun 2017, 04:25

Jonathan:
you may want to try something along the following lines:

Code:

bysort idcode (date): keep if _n==1

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Anesh Panchal

Join Date: May 2017

Posts: 10
#3

20 Jun 2017, 04:29

Hi Jonathan

best to do this would be to use collapse (min) datevar, by(patient_id)

this will give u a data set with minimum date for each patient
once you have run this you may need to format the variable as a date again
use format %td datevar

thanks

Anesh
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35652
#4

20 Jun 2017, 06:04

Anesh's proposal I think misses a key point that it's entire observations that are to be kept.

Carlo's proposal is I think on target.

duplicates isn't the solution of choice here. You should prefer it to be explicit in your code that you keep the first visit for each person.
Comment

Announcement

Dropping duplicates by date

Comment

Comment

Comment