Left join involving two datafiles

Dave Velthoff

Join Date: Feb 2018

Posts: 4
#1

Left join involving two datafiles

01 Mar 2018, 06:37

Hello,

I would like to perform a left join that involves two datafiles. Because I am, due to contractual reasons, not allowed to discuss the actual datasets, I will use imaginary datafiles to illustrate my question.

Datafile 1: Parents.dta
Datafile 2: Children.dta

Note that each parent only can have one child, or none, but not multiple. Similarly, a child can only have one or none parents, but not multiple.

This yields the following datasets:

Parents.dta:
- ID (unique identifier)
- Name (String)
- Child ID (unique identifier for exactly 1 child, or no ID if no child)

Children.dta:
- ID (unique identifier for child, Parents.dta refers to this ID in Child ID)
- Child name (String)

Examples of data:

Parents.dta:
ID Name Child ID
1 Richard .
2 Simon 1

Children.dta:
ID Child name
1 Vanessa
2 Tim

I want to join Child ID in Parents.dta to the ID in Children.dta, but only keep the original records in Parents.dta. Then I want to include Child Name in the new datafile.

This should result in the following datafile:
ID Name Child ID Child Name
1 Richard . .
2 Simon 1 Tim

Note that child Vanessa is not included here.

How could I perform such Left Join ? Do you perhaps have any syntax that can help me understand the reasoning?

Thank you very much!
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2411
#2

01 Mar 2018, 07:13

This would involve doing a 1:1 -merge- to match parents and children on the child's id, and then inspecting the results of the _merge result variable to drop the cases that fail the match. Note that in Stata, the key variable must have the same name in the master file and in the file to be merged (the "using" file in Stata lingo.) Stata handles the 1:1 vs. 1:m vs. m:1 matching problem (by default) in a way different than you might be used to from the database world, on which subject -help merge- and a look at the discussion of the _merge variable.

However, I'm afraid I'm not understanding your example: I understood you to mean that the key variable defining the match is child's id. So, I'm seeing that the child id for the parent Simon is "1,", but in your result example, Simon is matched to child Tim, whose child id is "2." I'd suggest you correct me and clarify this.
Comment

Announcement

Left join involving two datafiles

Comment