Hi all,
I am currently working on data quality and I would like to create a dataset that will keep the duplicates and drop the non-duplicates observations.
Is there any code that can help me to achieve this goal?
I have shared the dataset with the link below. I'm trying to keep the duplicates based on these variables: facilityid pickupdate uniqueno description.
Based on the duplicates report, I should be able to keep a dataset of 500 variables:
duplicates report /*Duplicate is not case-sensitive
--------------------------------------
Copies | Observations Surplus
----------+---------------------------
1 | 69763 0
2 | 916 458
3 | 63 42
--------------------------------------
Is it possible to keep the duplicates only?
Thank you very much!
https://drive.google.com/file/d/1x8I...ew?usp=sharing
I am currently working on data quality and I would like to create a dataset that will keep the duplicates and drop the non-duplicates observations.
Is there any code that can help me to achieve this goal?
I have shared the dataset with the link below. I'm trying to keep the duplicates based on these variables: facilityid pickupdate uniqueno description.
Based on the duplicates report, I should be able to keep a dataset of 500 variables:
duplicates report /*Duplicate is not case-sensitive
--------------------------------------
Copies | Observations Surplus
----------+---------------------------
1 | 69763 0
2 | 916 458
3 | 63 42
--------------------------------------
Is it possible to keep the duplicates only?
Thank you very much!
https://drive.google.com/file/d/1x8I...ew?usp=sharing
Comment