Hi all,
I need some help transforming a multiple record dataset to unique person. In my dataset only 80% of persons have id present and even when is present it may have errors in it. Therefore, to account for that I need to use the help of other identifiers such as : first name, last name and birth date which may have errors present . I am illustrating below my dataset and some of issues relating to it to help understanding.
The way I have done the transformation was using three steps: In the first step, I identified unique persons using duplicates report and tag commands among those with id present by using the id first name, last name and birth date. During the second step, I identified unique persons among those that do not have id present based on the first name, last name and birth date only. In the last I repeat the second step by looking at all records. I am just wondering if there is a more robust method and shorter way to do it ?
Your help is really appreciated,
id last_name first_name birth_date
11 Red Pencil June-01-2004
11 Red Penicl June-01-2004
20 Empty Box May-27-1967
. Empty Box May-27-1967
30 Beautiful Day April-4-1939
. Beautiful Day April-3-1939
14 Red Carpet September-01-2001
41 Red Carpet September-01-2001
14 Red Carpet September-01-2001
Adriana
I need some help transforming a multiple record dataset to unique person. In my dataset only 80% of persons have id present and even when is present it may have errors in it. Therefore, to account for that I need to use the help of other identifiers such as : first name, last name and birth date which may have errors present . I am illustrating below my dataset and some of issues relating to it to help understanding.
The way I have done the transformation was using three steps: In the first step, I identified unique persons using duplicates report and tag commands among those with id present by using the id first name, last name and birth date. During the second step, I identified unique persons among those that do not have id present based on the first name, last name and birth date only. In the last I repeat the second step by looking at all records. I am just wondering if there is a more robust method and shorter way to do it ?
Your help is really appreciated,
id last_name first_name birth_date
11 Red Pencil June-01-2004
11 Red Penicl June-01-2004
20 Empty Box May-27-1967
. Empty Box May-27-1967
30 Beautiful Day April-4-1939
. Beautiful Day April-3-1939
14 Red Carpet September-01-2001
41 Red Carpet September-01-2001
14 Red Carpet September-01-2001
Adriana
Comment