Hi folks
Apologies if answered elsewhere, but I couldn't find it.
I have two sets of data (examples below). Both have multiple observations per person, and each observation is associated with a date and some other information. I'd like to merge from my using data on the basis of most recent past date (with respect to the data in memory).
For context, these are health service use data. I am trying to determine the activity in the first dataset according to the most recent 'outcome' that's recorded in the second dataset.
So...
The first:
The second:
What I'm hoping to achieve:
I'm sure there's a straightforward solution, but I just can't get my head around it!
Note that this is a large dataset (>1million obs) with many observations per person, so a solution in which the data had to be converted to wide format would not be ideal.
Thanks in advance.
Apologies if answered elsewhere, but I couldn't find it.
I have two sets of data (examples below). Both have multiple observations per person, and each observation is associated with a date and some other information. I'd like to merge from my using data on the basis of most recent past date (with respect to the data in memory).
For context, these are health service use data. I am trying to determine the activity in the first dataset according to the most recent 'outcome' that's recorded in the second dataset.
So...
The first:
ID | date | Var1 | Var2 |
1 | 16feb2007 | 4 | 346 |
1 | 26nov2007 | 34 | 34 |
1 | 02apr2007 | 78 | 346 |
1 | 29oct2008 | 325 | 8 |
2 | 03jan2008 | 13 | 57 |
3 | 23apr2008 | 26 | 586 |
3 | 30apr2008 | 3 | 1 |
4 | 25jun2008 | 6 | 976 |
5 | 31mar2008 | 23 | 5 |
The second:
ID | date | outcome |
1 | 04jan2007 | 0 |
1 | 21may2007 | 1 |
1 | 01sep2008 | 0 |
2 | 01jan2008 | 1 |
2 | 10feb2009 | 0 |
3 | 04may2008 | 0 |
4 | 17may2007 | 0 |
4 | 16jan2009 | 1 |
5 | 19feb2008 | 0 |
What I'm hoping to achieve:
ID | date | Var1 | Var2 | date_outcome | outcome |
1 | 16feb2007 | 4 | 346 | 04jan2007 | 0 |
1 | 26nov2007 | 34 | 34 | 21may2007 | 1 |
1 | 02apr2007 | 78 | 346 | 04jan2007 | 0 |
1 | 29oct2008 | 325 | 8 | 01sep2008 | 0 |
2 | 03jan2008 | 13 | 57 | 01jan2008 | 1 |
3 | 23apr2008 | 26 | 586 | . | . |
3 | 30apr2008 | 3 | 1 | . | . |
4 | 25jun2008 | 6 | 976 | 17may2007 | 0 |
5 | 31mar2008 | 23 | 5 | 19feb2008 | 0 |
I'm sure there's a straightforward solution, but I just can't get my head around it!
Note that this is a large dataset (>1million obs) with many observations per person, so a solution in which the data had to be converted to wide format would not be ideal.
Thanks in advance.
Comment