I have a data set I need to merge with another by patient id & date of attendance. Currently I have the data in long format by ‘test’, this means there could be the same patient recording multiple tests on the same day, and as it does not count as a duplicate observation (I cannot remove it), it means I can no longer uniquely identify for merging based on ID and attendance date.
Current variables: pat_id; date_of_attendance; test
Example:
(dates are in 'DMY' format)
What I want to do:
Sort the data by patient ID & date of attendance, where there are multiple observations on the same day by the same patient create a new variable such as ‘Test_number: 0, 1, 2, 3’. This would then reset if the day changes and/or the patient ID changes.
i.e.
I would then use the reshape command by ‘test_no’ to change from long format to wide format so I get data recorded per date instead of per test.
The wide format would then consist of: Pat_ID; date_of_attendance; test1; test2; test3; test4; etc…
i.e.
Current variables: pat_id; date_of_attendance; test
Example:
| Pat_id | Date_of_attendance | test |
| 1 | 04/05/2022 | 1 |
| 1 | 04/05/2022 | 2 |
| 2 | 04/05/2022 | 5 |
| 3 | 06/05/2022 | 1 |
| 3 | 06/05/2022 | 3 |
| 3 | 09/05/2022 | 7 |
What I want to do:
Sort the data by patient ID & date of attendance, where there are multiple observations on the same day by the same patient create a new variable such as ‘Test_number: 0, 1, 2, 3’. This would then reset if the day changes and/or the patient ID changes.
i.e.
| Pat_id | Date_of_attendance | test | Test_no |
| 1 | 04/05/2022 | 1 | 0 |
| 1 | 04/05/2022 | 2 | 1 |
| 2 | 04/05/2022 | 5 | 0 |
| 3 | 06/05/2022 | 1 | 0 |
| 3 | 06/05/2022 | 3 | 1 |
| 3 | 06/05/2022 | 6 | 2 |
| 3 | 09/05/2022 | 7 | 0 |
The wide format would then consist of: Pat_ID; date_of_attendance; test1; test2; test3; test4; etc…
i.e.
| Pat_id | Date_of_attendance | Test0 | Test1 | Test2 |
| 1 | 04/05/2022 | 1 | 2 | . |
| 2 | 04/05/2022 | 5 | . | . |
| 3 | 06/05/2022 | 1 | 3 | 6 |
| 3 | 09/05/2022 | 7 | . | . |

Comment