I'm having trouble removing duplicates on address and retaining all unique values from the duplicates.
I'd like to take a dataset like this:
And convert it to this where I keep only the unique values for each address:
I can get this far, but I don't know how to remove the duplicate values across ID-ID4, Name-Name4, Phone-Phone4:
The number of duplicates on Address will vary as well as for the other variables. The duplicates will be in different orders too. For example, another set of duplicate values within the dataset might look like this:
Thanks!
I'd like to take a dataset like this:
ID | Name | Address | Phone |
35 | Blue | 13 Washington St | 888-555-1111 |
33 | Blue | 13 Washington St | 888-555-1111 |
32 | Green | 13 Washington St | 888-555-2222 |
33 | Red | 13 Washington St | 888-555-1111 |
And convert it to this where I keep only the unique values for each address:
ID | Name | Address | Phone | ID2 | ID3 | Name2 | Name3 | Phone2 |
35 | Blue | 13 Washington St | 888-555-1111 | 33 | 32 | Green | Red | 888-555-2222 |
I can get this far, but I don't know how to remove the duplicate values across ID-ID4, Name-Name4, Phone-Phone4:
ID | Name | Address | Phone | ID2 | ID3 | ID4 | Name2 | Name3 | Name4 | Phone2 | Phone3 | Phone4 |
35 | Blue | 13 Washington St | 888-555-1111 | 33 | 32 | 33 | Blue | Green | Red | 888-555-1111 | 888-555-2222 | 888-555-1111 |
The number of duplicates on Address will vary as well as for the other variables. The duplicates will be in different orders too. For example, another set of duplicate values within the dataset might look like this:
ID | Name | Address | Phone |
61 | Jack | 1 Franklin Ave | 999-444-333 |
61 | Mary | 1 Franklin Ave | 222-555-666 |
62 | Mary | 1 Franklin Ave | 777-333-222 |
64 | Donald | 1 Franklin Ave | 222-555-666 |
61 | Harry | 1 Franklin Ave | 222-555-666 |
63 | Harry | 1 Franklin Ave | 777-333-222 |
64 | Jack | 1 Franklin Ave | 777-333-222 |
Thanks!
Comment