Hello All,
I was wondering whether someone might help me with understanding how to create a unique variable upon which to merge 2 panel datasets in which individuals can have more than one observation per time point?
Specifically, my 2 datasets, one pharmacy claims, one medical claims are in long format where one individual can use more than 1 drug or use more than one medical service per month. I’d like to produce a variable which would merge the pharmacy and medical claims based on the combo of individual id (“mbr_ckey”) and month of use (“cmonth”). The datasets look something like the following:
Pharmacy
Medical
I’ve tsset and tssfill, full both datasets so all individuals have the entire range of time observations (0-3 here), regardless if they’ve incurred a claim during a particular month.
What I’m thinking to do is to generate a new variable which would uniquely identify each mbr_ckey and cMonth combo by combing mbr_ckey and cMonth names for the observation and numbering mbr_ckey and cMonth combos sequentially within cmonths. For example, in the pharmacy dataset the new variable might look like:
Pharmacy
After doing so for both datasets, I could merge 1:1 on Mergekey.
To restate my question:
Karl
I was wondering whether someone might help me with understanding how to create a unique variable upon which to merge 2 panel datasets in which individuals can have more than one observation per time point?
Specifically, my 2 datasets, one pharmacy claims, one medical claims are in long format where one individual can use more than 1 drug or use more than one medical service per month. I’d like to produce a variable which would merge the pharmacy and medical claims based on the combo of individual id (“mbr_ckey”) and month of use (“cmonth”). The datasets look something like the following:
Pharmacy
mbr_ckey | cMonth | Drug name |
1 | 0 | Aspirin |
1 | 0 | Oxycodone |
1 | 1 | Aspirin |
1 | 2 | Aspirin |
1 | 3 | Aspirin |
2 | 0 | simvastatin |
2 | 1 | . |
2 | 2 | Simvastatin |
2 | 3 | Simvastatin |
2 | 3 | warfarin |
mbr_ckey | cMonth | CPT Code |
1 | 0 | . |
1 | 1 | 10021 |
1 | 2 | . |
1 | 3 | 99341 |
2 | 0 | . |
2 | 1 | . |
2 | 2 | . |
2 | 3 | . |
What I’m thinking to do is to generate a new variable which would uniquely identify each mbr_ckey and cMonth combo by combing mbr_ckey and cMonth names for the observation and numbering mbr_ckey and cMonth combos sequentially within cmonths. For example, in the pharmacy dataset the new variable might look like:
Pharmacy
mbr_ckey | cMonth | Drug name | Mergekey |
1 | 0 | Aspirin | 1-0-1 |
1 | 0 | Oxycodone | 1-0-2 |
1 | 1 | Aspirin | 1-1-1 |
1 | 2 | Aspirin | 1-2-1 |
1 | 3 | Aspirin | 1-3-1 |
2 | 0 | simvastatin | 2-0-1 |
2 | 1 | . | 2-1-1 |
2 | 2 | Simvastatin | 2-2-1 |
2 | 3 | Simvastatin | 2-3-1 |
2 | 3 | warfarin | 2-3-2 |
To restate my question:
- Is there an even better way to do what I am proposing?
- If not, what would be the Stata code to generate the Mergekey?
Karl
Comment