Hi All
I am looking to merge two datasets. Master dataset has hospital admission information of patients - all can be identified using unique patient id (PatID). the using dataset has procedure information of the patients which again all can be identified by using the same PatID. However, as patients can appear multiple times during the year in both datasets, the observations are not uniqely identified by PatID anymore. I have tried a 1:1 merge using PatID year and month which gives me approx 80% match across both datasets. I am missing some patients as some patients in master dataset admitted at the end of the month may appear in the procedure dataset in the following month. For example. Patient 1 admitted on 29th January in master dataset had a procedure on the 3rd of Feb in the using dataset.
I am interested to keep all observations as I want to study the patients who may be admitted multiple times during the years and have multiple procedures, Hence dont want to remove all duplicates
Any help Please?
below is my command and my data example
clear
input str11 PatID float(year month)
"65203" 2017 9
"72071" 2019 2
"46611" 2018 9
"48441" 2017 11
"44841" 2017 9
"47041" 2018 1
"45093" 2019 7
"46401" 2017 7
"47046" 2018 3
"45022" 2019 3
"45022" 2019 9
Sort PatID year month
merge 1:1 PatID year month using "test.dta"
Hi All
I am looking to merge two datasets. Master dataset has hospital admission information of patients - all can be identified using unique patient id (PatID). the using dataset has procedure information of the patients which again all can be identified by using the same PatID. However, as patients can appear multiple times during the year in both datasets, the observations are not uniqely identified by PatID anymore. I have tried a 1:1 merge using PatID year and month which gives me approx 80% match across both datasets. I am missing some patients as some patients in master dataset admitted at the end of the month may appear in the procedure dataset in the following month. For example. Patient 1 admitted on 29th January in master dataset had a procedure on the 3rd of Feb in the using dataset.
I am interested to keep all observations as I want to study the patients who may be admitted multiple times during the years and have multiple procedures, Hence dont want to remove all duplicates
Any help Please?
below is my command and my data example
clear
input str11 PatID float(year month)
"65203" 2017 9
"72071" 2019 2
"46611" 2018 9
"48441" 2017 11
"44841" 2017 9
"47041" 2018 1
"45093" 2019 7
"46401" 2017 7
"47046" 2018 3
"45022" 2019 3
"45022" 2019 9
Sort PatID year month
merge 1:1 PatID year month using "test.dta"
I am looking to merge two datasets. Master dataset has hospital admission information of patients - all can be identified using unique patient id (PatID). the using dataset has procedure information of the patients which again all can be identified by using the same PatID. However, as patients can appear multiple times during the year in both datasets, the observations are not uniqely identified by PatID anymore. I have tried a 1:1 merge using PatID year and month which gives me approx 80% match across both datasets. I am missing some patients as some patients in master dataset admitted at the end of the month may appear in the procedure dataset in the following month. For example. Patient 1 admitted on 29th January in master dataset had a procedure on the 3rd of Feb in the using dataset.
I am interested to keep all observations as I want to study the patients who may be admitted multiple times during the years and have multiple procedures, Hence dont want to remove all duplicates
Any help Please?
below is my command and my data example
clear
input str11 PatID float(year month)
"65203" 2017 9
"72071" 2019 2
"46611" 2018 9
"48441" 2017 11
"44841" 2017 9
"47041" 2018 1
"45093" 2019 7
"46401" 2017 7
"47046" 2018 3
"45022" 2019 3
"45022" 2019 9
Sort PatID year month
merge 1:1 PatID year month using "test.dta"
Hi All
I am looking to merge two datasets. Master dataset has hospital admission information of patients - all can be identified using unique patient id (PatID). the using dataset has procedure information of the patients which again all can be identified by using the same PatID. However, as patients can appear multiple times during the year in both datasets, the observations are not uniqely identified by PatID anymore. I have tried a 1:1 merge using PatID year and month which gives me approx 80% match across both datasets. I am missing some patients as some patients in master dataset admitted at the end of the month may appear in the procedure dataset in the following month. For example. Patient 1 admitted on 29th January in master dataset had a procedure on the 3rd of Feb in the using dataset.
I am interested to keep all observations as I want to study the patients who may be admitted multiple times during the years and have multiple procedures, Hence dont want to remove all duplicates
Any help Please?
below is my command and my data example
clear
input str11 PatID float(year month)
"65203" 2017 9
"72071" 2019 2
"46611" 2018 9
"48441" 2017 11
"44841" 2017 9
"47041" 2018 1
"45093" 2019 7
"46401" 2017 7
"47046" 2018 3
"45022" 2019 3
"45022" 2019 9
Sort PatID year month
merge 1:1 PatID year month using "test.dta"

Comment