Hi everyone,
This is my first post to Statalist, so please forgive any errors. I have a large, registry based dataset (1 million+ observations) that has all information regarding pregnancies for a large European country. Separately, I have a dataset including all patient visits to a health practitioner, including diagnoses. I would like to combine these two datasets to have all diagnoses linked to each mother in the first dataset.
The problem that I am running into is that both datasets have multiple entries for the same ID number. The pregnancy registry has multiple pregnancies per woman, and the patient visit dataset includes all visits for each ID number. When I try to merge using 1:1, m:1, or 1:m using ID number I get a notification that the ID number does not uniquely identify observations (true for both datasets). Following the advice of others on Statalist, I used the joinby command, which seems to have linked my datasets properly, but now includes many more observations than I am looking for.
I am hoping to keep each pregnancy as a separate observation, but in this new set combined with joinby, there are multiple observations per ID number that correspond to separate patient visits. Please see the example below:
Dataset 1:
ID Number Year (of birth) Var1 Var 2 Var 3 Var4 ....
1 2008 1 1 1 2
2 2009 3 1 2 1
3 2008 2 1 3 1
4 2009 1 2 2 2
5 2007 2 3 1 1
5 2010 3 4 1 1
6 2007 2 2 2 2
Dataset 2:
ID Number Year (of diagnosis) Diagnosis
1 2008 1
1 2008 3
1 2008 2
2 2009 1
2 2009 2
3 2010 3
3 2010 2
I would like the data to look like this:
ID Number Year (of birth) Var1 Var 2 Var 3 Var4 Diagnosis 1 Diagnosis 2 Diagnosis 3...
1 2008 1 1 1 2 1 3 2
2 2009 3 1 2 1 1 2 .
3 2008 2 1 3 1 3 2 .
4 2009 1 2 2 2 . . .
5 2007 2 3 1 1 . . .
5 2010 3 4 1 1 . . .
6 2007 2 2 2 2 . . .
Please let me know if you have any advice!
Thank you, Nicole
This is my first post to Statalist, so please forgive any errors. I have a large, registry based dataset (1 million+ observations) that has all information regarding pregnancies for a large European country. Separately, I have a dataset including all patient visits to a health practitioner, including diagnoses. I would like to combine these two datasets to have all diagnoses linked to each mother in the first dataset.
The problem that I am running into is that both datasets have multiple entries for the same ID number. The pregnancy registry has multiple pregnancies per woman, and the patient visit dataset includes all visits for each ID number. When I try to merge using 1:1, m:1, or 1:m using ID number I get a notification that the ID number does not uniquely identify observations (true for both datasets). Following the advice of others on Statalist, I used the joinby command, which seems to have linked my datasets properly, but now includes many more observations than I am looking for.
I am hoping to keep each pregnancy as a separate observation, but in this new set combined with joinby, there are multiple observations per ID number that correspond to separate patient visits. Please see the example below:
Dataset 1:
ID Number Year (of birth) Var1 Var 2 Var 3 Var4 ....
1 2008 1 1 1 2
2 2009 3 1 2 1
3 2008 2 1 3 1
4 2009 1 2 2 2
5 2007 2 3 1 1
5 2010 3 4 1 1
6 2007 2 2 2 2
Dataset 2:
ID Number Year (of diagnosis) Diagnosis
1 2008 1
1 2008 3
1 2008 2
2 2009 1
2 2009 2
3 2010 3
3 2010 2
I would like the data to look like this:
ID Number Year (of birth) Var1 Var 2 Var 3 Var4 Diagnosis 1 Diagnosis 2 Diagnosis 3...
1 2008 1 1 1 2 1 3 2
2 2009 3 1 2 1 1 2 .
3 2008 2 1 3 1 3 2 .
4 2009 1 2 2 2 . . .
5 2007 2 3 1 1 . . .
5 2010 3 4 1 1 . . .
6 2007 2 2 2 2 . . .
Please let me know if you have any advice!
Thank you, Nicole
Comment