Hello all!
I am working on cleaning up a dataset, and I do not know how to do so. Here are some of the relevant parts of the data set that has approximately 400,000 observations, with duplicates based on an ID number. What I want to do is keep the case IDs that correspond to the highest outcome. So, for the following:
I want to keep only the first row because it has the highest code. Some IDs have 5 corresponding values for outcomes; some have 2; I think one even has 10.
I have tried to google this, but got very confused by duplicates and dups. I'd really appreciate any suggestions anyone has.
I am working on cleaning up a dataset, and I do not know how to do so. Here are some of the relevant parts of the data set that has approximately 400,000 observations, with duplicates based on an ID number. What I want to do is keep the case IDs that correspond to the highest outcome. So, for the following:
ID Number | Date | Outcome |
3 | 2/2/22 | 4 |
3 | 2/2/22 | 3 |
3 | 2/2/22 | 3 |
3 | 2/2/22 | 2 |
I have tried to google this, but got very confused by duplicates and dups. I'd really appreciate any suggestions anyone has.
Comment