Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching

    Hi,

    I have one dataset with 3000 surnames and want to get the county of origin of these surnames, match it to a bigger, Census data where each surname appears in more than 1 counties , e.g surname Murphy appears in the Census 1000 times, out of which 24 % of the time in county Cork, 18 % of time in Kerry, 16 % in Down, so on (there are 32 counties). So I want the surnames in my small dataset to get matched to the county of origin from the Census where that surname appears more than in other counties, Cork from the above example.


    Do you know how I can do it ?
    Many thanks,
    Ciara

  • #2
    Or say how can I save the surnames in the Census dataset only if they have the highest occurrence of the variable county- say in my previous example, to keep surname Murphy only if the county is Cork , the county where it appears most often. I guess after that the matching will be easy . Many thanks in advance!

    Comment


    • #3
      I interpreted your question to mean, "Create a dataset where each row has a county name and the most common surname in that county." If so, maybe something like this? Note that in counties with a tie for the most common surname, like County Cook (which has one person named Jones and one person named Brown), this code will arbitrarily select the last surname.

      Code:
      clear
      input str5 county str6 surname
      "Cork"  "Murphy"
      "Cork"  "Murphy"
      "Cork"  "Murphy"
      "Cork"  "Murphy"
      "Cork"  "Murphy"
      "Kerry" "Murphy"
      "Kerry" "Murphy"
      "Kerry" "Murphy"
      "Down"  "Murphy"
      "Down"  "Murphy"
      "Down"  "Murphy"
      "Down"  "Murphy"
      "Clare" "Murphy"
      "Clare" "Murphy"
      "Clare" "Murphy"
      "Clare" "Murphy"
      "Clare" "Murphy"
      "Clare" "Murphy"
      "Clare" "Murphy"
      "Cork"  "Smith" 
      "Cork"  "Smith" 
      "Cork"  "Smith" 
      "Cork"  "Smith" 
      "Cork"  "Smith" 
      "Kerry" "Smith" 
      "Kerry" "Smith" 
      "Kerry" "Smith" 
      "Kerry" "Smith" 
      "Clare" "Smith" 
      "Clare" "Smith" 
      "Clare" "Smith" 
      "Clare" "Smith" 
      "Clare" "Smith" 
      "Clare" "Smith" 
      "Cork"  "Jones" 
      "Cork"  "Jones" 
      "Down"  "Jones" 
      "Down"  "Jones" 
      "Down"  "Jones" 
      "Down"  "Jones" 
      "Down"  "Jones" 
      "Clare" "Jones" 
      "Cook" "Jones" 
      "Cook" "Brown" 
      end
      
      contract county surname
      list
      
      bysort county (_freq): keep if _n==_N
      list
      David Radwin
      Senior Researcher, California Competes
      californiacompetes.org
      Pronouns: He/Him

      Comment


      • #4
        Clara, I am concerned that with half the population of Ireland in the five largest counties, you will find that the procedure you propose results in the largest counties being overrepresented in the results.

        Comment


        • #5
          Thanks all for the replies! I think I found the solution

          Comment

          Working...
          X