How to merge two data sets with non-unique observations?

Mark Xu

Join Date: Jul 2019

Posts: 11
#1

How to merge two data sets with non-unique observations?

10 Jul 2019, 10:31

I apologize if this question seems elemental, but I could not figure out how to do this after reading the related documentations.

I want to merge the master dataset with two datasets like so:

A:
Household Person

1 1

1 2

B:
Household Loan of Household (No.)

1 1

1 2

Into:
Household Person Loan of Household

1 1 1

1 2 1

1 1 2

1 2 2

If I merge the master dataset (with household demographics) with A, Household doesn't uniquely identify observations when trying to merge with B. If I merge with B first, the dataset doesn't uniquely identify with A. I could add "Loan of Household (No.)" into A as missing values and then do a merge with varlist Household Loan, but is there a more rigid way of doing it? TIA.

Last edited by Mark Xu; 10 Jul 2019, 10:34.
Tags: None
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#2

10 Jul 2019, 11:23

Looks like you want to merge by group so joinby may be helpful, see help joinby.
1 like
Comment
Mark Xu

Join Date: Jul 2019

Posts: 11
#3

10 Jul 2019, 12:09

Originally posted by Wouter Wakker View Post

Looks like you want to merge by group so joinby may be helpful, see help joinby.

Thank you! joinby... unmatched(both) worked.* I am curious as to know why 1:m merge did not work, even though I feel like it works similarly to joinby... unmatched(both). Do you mind explaining it a little bit?

*: I used:
use "master.dta"
joinby household using A, unmatched(both)
drop _merge
joinby household using B, unmatched(both)
Comment
Wouter Wakker

Join Date: Nov 2018

Posts: 621
#4

10 Jul 2019, 12:49

Glad it worked.

Joinby and merge do different things really. Merge is not made for creating extra observations. It merges existing observations to join variables for the same observations. For that, it needs to uniquely identify observations, but in your case 'household' does not uniquely identify observations because you have multiple observations with household id '1' in both datasets. 1:m and m:1 will therefore not work. m:m will, but doesn't get you what you want, as it never does really.

What you want is to form pairwise combinations of person and loan within the household. In the case above, you go from 2 initial observations in both datasets to 2x2 = 4 observations in the merged dataset. Joinby does exactly that.
Comment
Mark Xu

Join Date: Jul 2019

Posts: 11
#5

10 Jul 2019, 13:00

Originally posted by Wouter Wakker View Post

Glad it worked.

Joinby and merge do different things really. Merge is not made for creating extra observations. It merges existing observations to join variables for the same observations. For that, it needs to uniquely identify observations, but in your case 'household' does not uniquely identify observations because you have multiple observations with household id '1' in both datasets. 1:m and m:1 will therefore not work. m:m will, but doesn't get you what you want, as it never does really.

What you want is to form pairwise combinations of person and loan within the household. In the case above, you go from 2 initial observations in both datasets to 2x2 = 4 observations in the merged dataset. Joinby does exactly that.

Thanks! I truly appreciate your help.
Comment

Household	Person
1	1
1	2

Household	Loan of Household (No.)
1	1
1	2

Household	Person	Loan of Household
1	1	1
1	2	1
1	1	2
1	2	2

Announcement

How to merge two data sets with non-unique observations?

Comment

Comment

Comment

Comment