Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets not uniquely identified

    Hello Statalist,
    I'm currently trying to to merge two data-sets which contain several matching variables but are missing 1 key variable from each other, my data is organized as follows:

    DATA 1 ........................................ DATA 2 ..........................DESIRED DATA
    Education Education Education
    Job Job Job
    Age Age Age
    Gender Gender Gender
    City City City
    Type - Type
    - Wage Wage
    My data-sets are not of the same size and the common variables are not enough to uniquely identify the matches, for this reason the merge command has been ineffective. DATA 1 is much bigger than DATA 2 so I want to keep only as many matches as can be made with DATA 2.

    Thought about using the command "Joinby using DATA2, unmatched(using)" but this returns the error _merge already defined even though the variable does not exist in the dataset and I have not run any other merges before. Is it just impossible to do what I want given my DATA?

    Thank you.

  • #2
    One of the "fuzzy match" add-on programs might be helpful. See -ssc describe matchit- or -ssc describe reclink-
    I'm inclined to believe there is some confusion regarding your thinking that the error message from -joinby- is wrong, but in any case, I don't think -joinby- will help you here: It would form pairs of observations that match on certain variables, but then you'd have to figure out which matches were really right by yourself. The "fuzzy match" programs, to my understanding, form such pairs and give you guidance in deciding which are the best matches.

    Comment


    • #3
      Welcome to Statalist.

      I am sorry to tell you that if Stata tells you the variable _merge is already defined in one of your datasets, then it certainly is. You say "the variable does not exist in the dataset" but indeed there are two datasets input to the joinby - so if you checked your master dataset (which would have been what was in memory after the joinby failed) and did not find it, then _merge must exist in your using dataset.

      Comment

      Working...
      X