Dear all,
I am trying to match observations from 2 data sets based on common strings within 2 string variables.
The data has the following structure:
Data Set 1:
Data Set 2:
I hope the example illustrates problem in an understandable manner - I would very much appreciate your support, thank you in advance!
I am trying to match observations from 2 data sets based on common strings within 2 string variables.
The data has the following structure:
Data Set 1:
Company | Number | Name |
A | 1 | Robert G., Diane S. |
A | 2 | Julia C. |
A | 3 | Robert G., Tim H. |
B | 1 | Brad D., Olaf E. |
B | 2 | Richard W., Albert C. |
Data Set 2:
Company | Description |
A | Robert G. became CEO [...] |
A | Diane S. was appointed [...] |
B | Richard W. died [...] |
B | Olaf E. was appointed [...] |
- Variable "Name" in data set 1 and variable "Description" in data set 2 are both strings
- First the observations can be matched based on the variable "Company"
- Then the goal is to find an identical string (one word) in the fields "Name" and "Description" based on which the two data sets can be merged in order to assign each name within variable "Description" in data set 2 to the correct "Number" from data set 1.
- This common string would be a new variable in the merged dataset called "MatchName"
- In Data Set 2, each name within "Description" only appears once for the respective company, but it may appear several times within "Name" in data set 1 but for different "Numbers" --> see example: Robert G. appears in data set 1 for Number 1 and 3; in data set 2 he only appears once, but in the merged data set he will appear twice (assigned to "Number" 1 and 3)
Company | Number | Name | Description | MatchName |
A | 1 | Robert G., Diane S. | Robert G. became CEO [...] | Robert G. |
A | 1 | Robert G., Diane S. | Diane S. was appointed [...] | Diane S. |
A | 3 | Robert G., Diane S. | Robert G. | |
B | 1 | Brad D., Olaf E. | Olaf E. was appointed [...] | Olaf E. |
B | 2 | Richard W., Albert C. | Richard W. died [...] | Richard W. |
I hope the example illustrates problem in an understandable manner - I would very much appreciate your support, thank you in advance!
Comment