Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Left join involving two datafiles

    Hello,

    I would like to perform a left join that involves two datafiles. Because I am, due to contractual reasons, not allowed to discuss the actual datasets, I will use imaginary datafiles to illustrate my question.

    Datafile 1: Parents.dta
    Datafile 2: Children.dta

    Note that each parent only can have one child, or none, but not multiple. Similarly, a child can only have one or none parents, but not multiple.

    This yields the following datasets:

    Parents.dta:
    - ID (unique identifier)
    - Name (String)
    - Child ID (unique identifier for exactly 1 child, or no ID if no child)

    Children.dta:
    - ID (unique identifier for child, Parents.dta refers to this ID in Child ID)
    - Child name (String)

    Examples of data:

    Parents.dta:
    ID Name Child ID
    1 Richard .
    2 Simon 1

    Children.dta:
    ID Child name
    1 Vanessa
    2 Tim

    I want to join Child ID in Parents.dta to the ID in Children.dta, but only keep the original records in Parents.dta. Then I want to include Child Name in the new datafile.

    This should result in the following datafile:
    ID Name Child ID Child Name
    1 Richard . .
    2 Simon 1 Tim

    Note that child Vanessa is not included here.

    How could I perform such Left Join ? Do you perhaps have any syntax that can help me understand the reasoning?

    Thank you very much!



  • #2
    This would involve doing a 1:1 -merge- to match parents and children on the child's id, and then inspecting the results of the _merge result variable to drop the cases that fail the match. Note that in Stata, the key variable must have the same name in the master file and in the file to be merged (the "using" file in Stata lingo.) Stata handles the 1:1 vs. 1:m vs. m:1 matching problem (by default) in a way different than you might be used to from the database world, on which subject -help merge- and a look at the discussion of the _merge variable.

    However, I'm afraid I'm not understanding your example: I understood you to mean that the key variable defining the match is child's id. So, I'm seeing that the child id for the parent Simon is "1,", but in your result example, Simon is matched to child Tim, whose child id is "2." I'd suggest you correct me and clarify this.

    Comment

    Working...
    X