Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets without uniquely identified observations

    Hello,

    I found a post on this subject but the solutions posted do not work for me.

    My master data set has 276 variables and 41,000 observations. Two of the variables are an MFIID and MFIName, however these are the same for multiple observations but with different fiscal years.
    My secondary dataset has 5 variables and 3115 observations. It also has the variables MFIID and MFIName but has a single observation for each one (to show the country and region of the institution).

    I am attempting to combine the two datasets so that each observation in my master dataset includes the corresponding data for country and region from my secondary dataset. I have tried merge 1:m, merge 1:1 and merge m:1 but I always receive the same error message: r(459); variables MFIID MFIName do not uniquely identify observations in the using data.

    Is there a way to merge datasets without having a unique identifier?

    Thank you


  • #2
    Hi Joe
    for your main question. Is there a way to merge dataset wiithout a unique identifier? Yes, you can use -joinby-.
    However, there is a more important question. Are you sure that MIFID and MIFName uniquely identify all observations in your "secondary dataset". Keep in mind that a unique identification should include cases where the are missing values. So, take a better look at your second dataset, perhaps drop observations where MIFID and MIFName are both missing. And try the merge 1:m or m:1 once again.
    HTH
    Fernando

    Comment


    • #3
      Try running
      Code:
      isid MFIID MFIName
      in your secondary dataset to confirm that those two variables uniquely identify observations in that dataset as you assert.

      If the isid command succeeds, then what you want is
      Code:
      use master
      merge m:1 MFIID MFIName using secondary
      If the isid command fails, then you need to learn why your secondary dataset has more than one observation for some combination(s) of MFIID and MFIName. The duplicates command is the tool you need.
      Code:
      use secondary
      help duplicates
      duplicates report MFIID MFIName
      will be a good starting point.

      Comment

      Working...
      X