Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge two data sets with non-unique observations?

    I apologize if this question seems elemental, but I could not figure out how to do this after reading the related documentations.

    I want to merge the master dataset with two datasets like so:

    A:
    Household Person
    1 1
    1 2
    B:
    Household Loan of Household (No.)
    1 1
    1 2
    Into:
    Household Person Loan of Household
    1 1 1
    1 2 1
    1 1 2
    1 2 2
    If I merge the master dataset (with household demographics) with A, Household doesn't uniquely identify observations when trying to merge with B. If I merge with B first, the dataset doesn't uniquely identify with A. I could add "Loan of Household (No.)" into A as missing values and then do a merge with varlist Household Loan, but is there a more rigid way of doing it? TIA.
    Last edited by Mark Xu; 10 Jul 2019, 10:34.

  • #2
    Looks like you want to merge by group so joinby may be helpful, see help joinby.

    Comment


    • #3
      Originally posted by Wouter Wakker View Post
      Looks like you want to merge by group so joinby may be helpful, see help joinby.
      Thank you! joinby... unmatched(both) worked.* I am curious as to know why 1:m merge did not work, even though I feel like it works similarly to joinby... unmatched(both). Do you mind explaining it a little bit?

      *: I used:
      use "master.dta"
      joinby household using A, unmatched(both)
      drop _merge
      joinby household using B, unmatched(both)

      Comment


      • #4
        Glad it worked.

        Joinby and merge do different things really. Merge is not made for creating extra observations. It merges existing observations to join variables for the same observations. For that, it needs to uniquely identify observations, but in your case 'household' does not uniquely identify observations because you have multiple observations with household id '1' in both datasets. 1:m and m:1 will therefore not work. m:m will, but doesn't get you what you want, as it never does really.

        What you want is to form pairwise combinations of person and loan within the household. In the case above, you go from 2 initial observations in both datasets to 2x2 = 4 observations in the merged dataset. Joinby does exactly that.

        Comment


        • #5
          Originally posted by Wouter Wakker View Post
          Glad it worked.

          Joinby and merge do different things really. Merge is not made for creating extra observations. It merges existing observations to join variables for the same observations. For that, it needs to uniquely identify observations, but in your case 'household' does not uniquely identify observations because you have multiple observations with household id '1' in both datasets. 1:m and m:1 will therefore not work. m:m will, but doesn't get you what you want, as it never does really.

          What you want is to form pairwise combinations of person and loan within the household. In the case above, you go from 2 initial observations in both datasets to 2x2 = 4 observations in the merged dataset. Joinby does exactly that.
          Thanks! I truly appreciate your help.

          Comment

          Working...
          X