Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with Merging the data

    Hello,

    I am trying to merge a patient data having MB linkage and pregnancy IDs with the Mother-Baby link dataset.

    I am merging on variable patient ID, after sorting them in both the datasets, but there is some mismatch in the delivery dates of the offspring.

    For example, master dataset has a 2 patients with patient ID 'XX' having 2 pregnancies with pregnancy IDs 'AA' and 'BB'
    MBL dataset (using dataset) has data for only pregnancy 'AA' for the patient 'XX'.

    I have used the syntax:
    merge m:m patid using "mbl.dta"

    Upon merging, the results show same delivery date for both the pregnancies 'AA' and 'BB', because no observation was present for pregnancy 'BB' in the using dataset.

    It would be great if I could get some help on this.

    Thanks
    Kritika

  • #2
    The most helpful thing I can say here is that -merge m:m- is, for practical purposes*, always wrong. It's a command that should not exist because what it does in almost all situations is create data salad. When you think you need -merge m:m- you either don't understand your data or your data are not even -merge-able.

    It isn't entirely clear to me from your description what you want the final result to look like. But my best guess is that the data sets contain, in addition to patid, a date variable that gives the date on which the birth took place--for illustration purposes let's call that variable bdate. If I have that right, the code you need is:
    Code:
    merge 1:m patid bdate using mbl
    If this does not run, or does not produce the desired results, please post back showing example data from both data sets that illustrates the difficulties you are having, as well as showing what you would like the results of the merge to look like. Be sure to use the -dataex- command for showing the example data. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    *While I would like to say that -merge m:m- is simply always wrong, this would be an overstatement. First, wherever -merge 1:m- or -merge m:1- can be used, -merge m:m- will produce the same result. Second, in 29 years of nearly daily use of Stata, I did on one occasion find a situation where -merge 1:m- and -merge m:1- were not possible, and -merge m:m- actually produced the desired results. However, I hasten to add that even in that situation, there was a better way to do that particular task. Since this kind of situation is so rare, it is better for you to make it a rule that you never use -merge m:m-.
    Last edited by Clyde Schechter; 25 May 2023, 10:38.

    Comment


    • #3
      Thank you for the helpful response.

      The master dataset do not have a complete date of birth but just year if birth.

      The code did not work with 1:m because patid and yob and do not have unique values in the master dataset.
      However, to check if the syntax will provide the desired results, i used:
      merge m:m patid yob using mbl.dta

      which provided the desired results.

      Comment


      • #4
        i used:
        merge m:m patid yob using mbl.dta

        which provided the desired results
        Most likely, it did not. Most likely it ran without an error message and the results, at a casual glance, were not obviously wrong. But if you scrutinize them, the probability is high you will find that they are wrong and that mismatches have occurred. I strongly encourage you to post back with example data from both data sets, using the -dataex- command. Be sure that the examples include at least some records that should match with each other.

        Comment


        • #5
          Kritika, I'll chime in briefly just to mention that - as a general alternative to merge m:m - you may want to consider joinby.

          Code:
          help joinby

          Comment

          Working...
          X