Tagging duplicates (ok) BUT only keeping one line of information

Rose Matthews

Join Date: Aug 2023
Posts: 154

Tagging duplicates (ok) BUT only keeping one line of information

15 Oct 2023, 05:37

Hello, I’ve made some progress over the course of the day and would like some help. Dataset found below:

I have tagged the rows that have duplicate information for each id in terms of oks and the components of another scar (pain and scar).

For duplicates >0 I would like to keep one row of information.

Therefore for ID 13, I would just like to keep one row, this will be left in the dataset with ID12

If I use

Code:

/Problem: Drop the one of these duplicate lines ****

drop if duplicatescores == 1 // drops all

keep if duplicatescores ==1 //drops ID12

This will drop or keep all the rows…

Is there another alternative solution where I just keep one row of data for ID = 13 and ID12 is left with the dataset

Dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex

clear

input float id str2 id2 float(pain scar surgdate oks gender preop_date recpreop_date postop_date recpostop_date operationdate) byte(_merge duplicatescores)

12 "1A" 1 1 22555 40 1 22533 22289 22770 22771 22555 3 0

13 "2B" 1 9 11962  1 2 11933 11933 11963 11782 11962 3 0

13 "2B" 1 9 11962 48 2 11932 11951 11963 11782 11962 3 1

13 "2B" 1 9 11962 48 2 11932     .     .     .     . . 1

end

format %td surgdate

format %td preop_date

format %td recpreop_date

format %td postop_date

format %td recpostop_date

format %td operationdate

label values _merge _merge

label def _merge 3 "Matched (3)", modify

Tags: None

Rich Goldstein

Join Date: Mar 2014

Posts: 4490
#2

15 Oct 2023, 06:23

Code:

h duplicates

especially "duplicates drop"
Comment
Rose Matthews

Join Date: Aug 2023

Posts: 154
#3

15 Oct 2023, 08:58

Hmm ! Didn't work as I wanted to...

Current dataset - I would expect Stata to keep the UNIQUE values and KEEP the first row of duplicates. So I would have expected Stata to keep ID 12 and the red boxes from ID 13

When I use the code:

Code:

//For those with duplicates in the score components duplicates tag (oks pain scar), gen(duplicatescores) duplicates drop duplicatescores, force

It drops the Unique Value of ID 13 where duplicatesscores = 0 and keeps the first row of duplicates (which is what I want)

Is there a way to do both ? i.E Keep the unique values & keep the first row of duplicates?

I tried this:

Code:

keep if duplicatescores == 0 & duplicates drop duplicatesscores, force // This didn't work
Comment

Announcement

Tagging duplicates (ok) BUT only keeping one line of information

Comment

Comment