duplicates drop - dropping more than the duplicates

Denise Vella

Join Date: Aug 2022

Posts: 187
#1

duplicates drop - dropping more than the duplicates

16 Oct 2023, 07:33

Hello I am working on a remote dataset.

As seen in the picture

Do you happen to know what I'm doing wrong.
I tried -duplicates drop- on my personal computer with this dummy dataset

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float id str2 id2 float(pessary injection) byte test 12 "1A" 1 1 0 13 "2B" 1 9 1 13 "2B" 1 9 . 13 "2B" 1 4 . end

Code:

duplicates tag (pessary injection id id2), gen(test2) duplicates drop test, force

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float id str2 id2 float(pessary injection) byte(test test2) 12 "1A" 1 1 0 0 13 "2B" 1 9 1 1 13 "2B" 1 9 . 1 13 "2B" 1 4 . 0 end

As you can see I would like to keep the Bold ones, but instead it drops the last observation for 13 (Unique value) and as request 1 of the duplicates.
As you can see I tried this so you can understand what I want... but of course & isnt allowed

duplicates drop test, force & keep if test2 == 0

Do you know what I'm doing wrong?

I just want to keep the first row for each duplicate therefore converting them to unique values . I can manualy do this, but just wondering out of curiosity what im doing wrong

Last edited by Denise Vella; 16 Oct 2023, 07:45.
Tags: None
Girish Venkataraman

Join Date: Dec 2021

Posts: 281
#2

16 Oct 2023, 07:50

List and tag the duplicates first before marking them for deletion. How will Stata know what corresponds to a duplicates? There needs to be an ID variable which identifies unique observations before determining if there is a duplicate within the ID. Start with -duplicates list- then duplicates tag
Comment
Denise Vella

Join Date: Aug 2022

Posts: 187
#3

16 Oct 2023, 07:53

I have tagged them as you ahve seen in the code
duplicates tag (pessary injection id id2), gen(test2)
Comment
daniel klein

Join Date: Mar 2014

Posts: 3885
#4

16 Oct 2023, 08:52

duplicate drop dorps exactly what you ask for: duplicates in terms of scoresid. There are 281,097 observations with scoresid == 0; thus, 281,096 duplicates. Likewise, there are 18 observations with scoresid == 1; thus, 17 duplicates. Add that together to get 281,113 duplicates that get dropped.

If you want something else, you need to tell Stata (or duplicates) what it is you want.
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3885
#5

16 Oct 2023, 09:10

Code:

duplicates drop id id2 pessary injection , force

does what you ask for your example data. Whether that is what you really want, I cannot tell for sure.
1 like
Comment

Announcement

duplicates drop - dropping more than the duplicates

Comment

Comment

Comment

Comment