Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging 1:1 using unique identifier for multiple admissions

    Hi All

    I am looking to merge two datasets. Master dataset has hospital admission information of patients - all can be identified using unique patient id (PatID). the using dataset has procedure information of the patients which again all can be identified by using the same PatID. However, as patients can appear multiple times during the year in both datasets, the observations are not uniqely identified by PatID anymore. I have tried a 1:1 merge using PatID year and month which gives me approx 80% match across both datasets. I am missing some patients as some patients in master dataset admitted at the end of the month may appear in the procedure dataset in the following month. For example. Patient 1 admitted on 29th January in master dataset had a procedure on the 3rd of Feb in the using dataset.
    I am interested to keep all observations as I want to study the patients who may be admitted multiple times during the years and have multiple procedures, Hence dont want to remove all duplicates
    Any help Please?
    below is my command and my data example
    clear
    input str11 PatID float(year month)

    "65203" 2017 9
    "72071" 2019 2
    "46611" 2018 9
    "48441" 2017 11
    "44841" 2017 9
    "47041" 2018 1
    "45093" 2019 7
    "46401" 2017 7
    "47046" 2018 3
    "45022" 2019 3
    "45022" 2019 9

    Sort PatID year month
    merge 1:1 PatID year month using "test.dta"
    Hi All

    I am looking to merge two datasets. Master dataset has hospital admission information of patients - all can be identified using unique patient id (PatID). the using dataset has procedure information of the patients which again all can be identified by using the same PatID. However, as patients can appear multiple times during the year in both datasets, the observations are not uniqely identified by PatID anymore. I have tried a 1:1 merge using PatID year and month which gives me approx 80% match across both datasets. I am missing some patients as some patients in master dataset admitted at the end of the month may appear in the procedure dataset in the following month. For example. Patient 1 admitted on 29th January in master dataset had a procedure on the 3rd of Feb in the using dataset.
    I am interested to keep all observations as I want to study the patients who may be admitted multiple times during the years and have multiple procedures, Hence dont want to remove all duplicates
    Any help Please?
    below is my command and my data example
    clear
    input str11 PatID float(year month)

    "65203" 2017 9
    "72071" 2019 2
    "46611" 2018 9
    "48441" 2017 11
    "44841" 2017 9
    "47041" 2018 1
    "45093" 2019 7
    "46401" 2017 7
    "47046" 2018 3
    "45022" 2019 3
    "45022" 2019 9

    Sort PatID year month
    merge 1:1 PatID year month using "test.dta"

  • #2
    I am not completely clear on what you want but first merge, with the master in memory, using something like -merge 1:m ... -; if you, for some reason, need to then get down to one observation per id, you probably want the -collapse- command; see
    Code:
    help merge
    help collapse

    Comment


    • #3
      I cannot use collapse as I am interested in merging repeated admissions across both datasets. For example, Person A appears in the master dataset twice in the master data in year 2019 and also appears twice in second data twice in 2019. I want to link both episodes across both datasets in time order.

      Comment


      • #4
        then use merge 1:m ... as I suggested in #2 and do not -collapse-; merge 1:1 will NOT do what you want

        Comment

        Working...
        X