Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data set matching

    I have 2 data sets with 75% similar names and 25% unique names. Values don't repeat within the data set ie there are no duplicate values within data set.
    I want to compare both data sets, and delete unique values from each data set, so that both data sets have names of same group of individuals only. They will remain 2 separate data sets, I don't want to merge or append them.

    How should I go about doing it?

    Data set Before
    Dataset 1 Dataset 2
    A B
    B G
    C A
    D E
    F C
    L

    Data set After
    Dataset 1 Dataset 2
    A B
    B A
    C C




  • #2
    Maybe consider something like this:
    Code:
    use dataset1, clear
    contract name, freq(discard)
    merge 1:m name using dataset2, keep(match) nogenerate noreport
    contract name
    drop _freq
    preserve
    merge 1:m name using dataset1, keep(match) nogenerate noreport
    save dataset1Revised
    restore
    merge 1:m name using dataset2, keep(match) nogenerate noreport
    save dataset2Revised

    Comment


    • #3
      Similar solution with Joseph's, just different type of coding.
      Code:
      use dataset2, clear
      keep name
      joinby using dataset1
      save dataset1_revised, replace
      keep name
      joinby using dataset2
      save dataset2_revised, replace

      Comment


      • #4
        Thanks! What if the variables are same, but differ by upper/lower case? For example, A - Celine Peter in Dataset 1 and CELINE PETER in Dataset 2? The code only matches variables which are exactly the same in terms of upper/lower case.

        Comment


        • #5
          Got it, thanks! It's gen lname = lower(name) and repeat the above steps!

          Comment

          Working...
          X