Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two datasets based on first and last names

    Hi! I'm working on an analysis of student performance. I have 2 lists- list A contains the names of the students that are a part of the analysis (n=150). List B is all the students who were ever enrolled in the program (n=5150). While list A only have the first and last names of these students, list B has first name, last name and performance variables. How can I merge the 2 data sets such that I have a final list C of all those students in list A and their detailed information from list B? Essentially, based on first name and last name variables, I want to identify all the 150 list A students in list B and get their details that are in list B . Here are some data extracts:
    List A:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str18 firstname str20 lastname
    "Abel"     "Morse"           
    "Aysha"       "Abdou"           
    "Alan"       "Alcindor"        
    "Perez"        "Reyes"           
    "Alejandro"   "Stuart"  
    "Dhanish"      "Bell"            
    "Amely"       "Lopes"           
    "Amilcar"     "Minueza"         
    "Ana"         "Fumero"          
    "Andre"       "Donald"       
    "Andrea"      "Logada"          
    "Anjanique"   "Alison"         
    "Anllelina"   "Salanger"               
    end

    List B:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str18 firstname str20 lastname int points str3 gov_assist str24 track
    "Worlf"     "Jeane"   125 ""    "Fin Operations"    
    "Aon"   "Fernandes"       200 ""    "IT"  
    "Key"     "Carlo"        200 ""    "SD"    
    "Kevin"    "D"           260 "No"  "IT"  
    "Wyne"   "Jean Paul"       440 "No"  "IT"  
    "Maine"  "O'bryant"         80 ""    "--None--"                
    "Ebony"   "Myers"           180 ""    "IT"  
    "Antony"   "Santana"         230 ""    "Fin Ops"    
    "Gie"     "Villafane"       330 "No"  "Information Technology"  
    "Jonah"  "Bogard"          290 "No"  "Information Technology"  
    "Ada"    "Pintos"       500 "No"  "Software Development"    
    "Karl"      "Chery"            45 ""    "Information Technology"  
    "Bianna"    "Bautista"        130 ""    "Financial Operations"    
    "Ikechi"    "Amaefule"        340 ""    "Software Development"    
    "Amanda"    "Caban"           260 "No"  "Information Technology"  
    "Yingying"  "Huang"           215 "No"  "Software Development"    
    "Neiko"     "Fortes"           70 ""    "Financial Operations"    
    "Grecia"    "Cartagena"       185 ""    "Financial Operations"    
    "wilders"   "Pierre"          140 ""    "Information Technology"  
    "Angelique" "Velazquez"       455 "Yes" "Financial Operations"    
    "Rawle"     "Howard"           10 "No"  "Information Technology"  
    "Gabriel"   "Rodriguez"       130 ""    "Financial Operations"      
    end
    Thank you in advance!

  • #2
    it appears that List B has the first two variable names reversed and the first variable should be "lastname" which the second should be "firstname"; however, even after correcting this, and sorting each file
    Code:
    sort lastname firstname
    save, replace
    the following -merge- command (with list B in memory) finds no matches - and none are obvious in your dataex examples above; so, treat this as possible code:
    Code:
    merge 1:1 lastname firstname using file1
    note that I called your "List A" file1.dta

    Comment


    • #3
      Thanks Rich, this was very helpful!

      Comment

      Working...
      X