Hello stata users,
I am trying to merge two datasets (master and university datas) and I would like to find out if there would be any algorithm method that makes them merged if the key variables are similar.
For instance, let's say in my university data, there is university titled "Johns Hopkins U." and in my master data, it is titled "Johns Hop. U.".
What I would like to do is to merge them if these variables have similar (or almost same) name.
My university data looks like :
And my master data looks like:
I tried to find prior posts in the FAQ, but I haven't yet found the solution.
Would there be any algorithm code for this ?
Thanks !
I am trying to merge two datasets (master and university datas) and I would like to find out if there would be any algorithm method that makes them merged if the key variables are similar.
For instance, let's say in my university data, there is university titled "Johns Hopkins U." and in my master data, it is titled "Johns Hop. U.".
What I would like to do is to merge them if these variables have similar (or almost same) name.
My university data looks like :
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str94 uni_name int year long funding "A. T. Still U." 1971 1 "A. T. Still U." 1972 1 "A. T. Still U." 1973 1 "A. T. Still U." 1974 1 "A. T. Still U." 1975 1 "A. T. Still U." 1976 1 "A. T. Still U." 1977 1 "A. T. Still U." 1978 1 "A. T. Still U." 1979 1 "A. T. Still U." 1980 1 "A. T. Still U." 1981 1 "A. T. Still U." 1982 1 "A. T. Still U." 1983 1 "A. T. Still U." 1984 1 "A. T. Still U." 1985 1 "A. T. Still U." 1986 1 "A. T. Still U." 1987 1 "A. T. Still U." 1988 1 "A. T. Still U." 1989 1 "A. T. Still U." 1990 1 "A. T. Still U." 1991 1 "A. T. Still U." 1992 1 "A. T. Still U." 1993 1 "A. T. Still U." 1994 1 "A. T. Still U." 1995 1 "A. T. Still U." 1996 1 "A. T. Still U." 1997 1 "A. T. Still U." 1998 1 "A. T. Still U." 1999 1 "A. T. Still U." 2000 1 "A. T. Still U." 2001 1 "A. T. Still U." 2002 1 "A. T. Still U." 2003 1 "A. T. Still U." 2004 1 "A. T. Still U." 2005 1 "A. T. Still U." 2006 1 "A. T. Still U." 2007 1 "A. T. Still U." 2008 1 "A. T. Still U." 2009 1 "A. T. Still U." 2010 1 "A. T. Still U." 2011 1 "A. T. Still U." 2012 1 "A. T. Still U." 2013 1 "A. T. Still U." 2014 0 "A. T. Still U." 2015 0 "A. T. Still U." 2016 0 "A. T. Still U." 2017 1 "A. T. Still U." 2018 1 "A. T. Still U." 2019 0 "AIB C. of Business" 1971 0 "AIB C. of Business" 1972 0 "AIB C. of Business" 1973 0 "AIB C. of Business" 1974 0 "AIB C. of Business" 1975 0 "AIB C. of Business" 1976 0 "AIB C. of Business" 1977 0 "AIB C. of Business" 1978 0 "AIB C. of Business" 1979 0 "AIB C. of Business" 1980 0 "AIB C. of Business" 1981 0 "AIB C. of Business" 1982 0 "AIB C. of Business" 1983 0 "AIB C. of Business" 1984 0 "AIB C. of Business" 1985 0 "AIB C. of Business" 1986 0 "AIB C. of Business" 1987 0 "AIB C. of Business" 1988 0 "AIB C. of Business" 1989 0 "AIB C. of Business" 1990 0 "AIB C. of Business" 1991 0 "AIB C. of Business" 1992 0 "AIB C. of Business" 1993 0 "AIB C. of Business" 1994 0 "AIB C. of Business" 1995 0 "AIB C. of Business" 1996 0 "AIB C. of Business" 1997 0 "AIB C. of Business" 1998 0 "AIB C. of Business" 1999 0 "AIB C. of Business" 2000 0 "AIB C. of Business" 2001 0 "AIB C. of Business" 2002 0 "AIB C. of Business" 2003 0 "AIB C. of Business" 2004 0 "AIB C. of Business" 2005 0 "AIB C. of Business" 2006 0 "AIB C. of Business" 2007 0 "AIB C. of Business" 2008 0 "AIB C. of Business" 2009 0 "AIB C. of Business" 2010 1 "AIB C. of Business" 2011 1 "AIB C. of Business" 2012 1 "AIB C. of Business" 2013 1 "AIB C. of Business" 2014 1 "AIB C. of Business" 2015 0 "AIB C. of Business" 2016 0 "AIB C. of Business" 2017 0 "AIB C. of Business" 2018 0 "AIB C. of Business" 2019 0 "AUI National Radio Astronomy Observatory" 1971 1 "AUI National Radio Astronomy Observatory" 1972 1 end
And my master data looks like:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input float uni_id str244 org_name double year . "" 1974 . "" 1975 . "" 1976 230 "Regents of the University of California and BP Amoco Corporation" 1981 534 "The University Hospital" 1975 346 "Regents of the University of Minnesota" 1977 1224 "University of Health Sciences/The Chicago Medical School" 1978 549 "Long Island University" 1976 . "" 1963 . "" 1964 . "" 1965 479 "Virginia State University" 1950 912 "College of William and Mary" 1960 1216 "The University of Chicago Development Corporation" 1991 173 "Texas Wesleyan University, Inc." 1992 . "" 1988 . "" 1989 . "" 1990 . "" 1991 . "" 1992 . "" 1993 . "" 1994 . "" 1995 . "" 1996 66 "Polytechnic University" 1980 28 "Nova University" 1967 240 "The University of California Los Angeles" 1986 957 "Erskine College" 1994 419 "The University of South Carolina" 2001 794 "Clemson University Research Foundation" 2006 794 "Clemson University Research Foundation" 2008 . "" 1999 . "" 2000 . "" 2001 . "" 2002 . "" 2003 . "" 2004 . "" 2005 . "" 2006 . "" 2007 . "" 2008 . "" 2009 1220 "Northerwestern University" 1950 245 "The University of Chicago" 1982 544 "Rosalind Franklin University of Medicine and Science" 1992 1224 "University of Health Sciences/The Chicago Medical School" 1995 1032 "Loyola University of Chicago" 1998 1224 "University of Health Sciences/The Chicago Medical School" 2003 837 "Northwestern University Medical School" 2007 1032 "Loyola University of Chicago" 2015 . "" 1972 . "" 1973 . "" 1974 38 "The Ohio State University Research Foundation" 1967 934 "USB Corporation" 1968 602 "Hoover Universal, Inc." 1986 509 "California Institute of Technology" 1972 146 "Research Foundation of the State University of New York" 1983 327 "Trustees of Boston University" 1973 611 "The Reents of the University of California" 1990 941 "Duke University" 1952 547 "Long Island University" 1962 411 "Board of Regents for Education of the State of Rhode Island" 1965 1123 "All American University, Incorporated" 1973 . "" 1950 . "" 1951 . "" 1952 . "" 1953 . "" 1954 . "" 1955 . "" 1956 . "" 1957 . "" 1958 . "" 1959 . "" 1960 . "" 1961 . "" 1962 . "" 1963 . "" 1964 . "" 1965 . "" 1966 . "" 1968 . "" 1969 . "" 1970 . "" 1971 . "" 1976 . "" 1977 . "" 1978 7 "North Dakota State University" 1984 . "" 1960 . "" 1961 . "" 1963 . "" 1964 . "" 1965 . "" 1966 513 "Wright State University" 1962 246 "University of Cincinnati" 1964 514 "Wright State University" 1966 956 "Emory University" 1967 254 "The University of Dayton" 1970 end
I tried to find prior posts in the FAQ, but I haven't yet found the solution.
Would there be any algorithm code for this ?
Thanks !
Comment