Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to merge based on a variable and then, if said variable does not produce a merge, merge on another variable instead?

    Hi everyone,

    I have two datasets I want to merge. The master dataset has 2 variables, A and B, that are candidates to merge with the using dataset's C. A and B do not uniquely identify observations, but C does. A and C are never missing, but B sometimes is. Ideally, I want to merge firstly using A, and then when _merge = 2, try to use B to merge those observations in.

    If this all seems a bit bizarre, in my datasets A, B, and C are all company names following slightly different formats, and while I am most confident A is best at matching with C, A is current name rather than historical unlike B and C so only B can provide a match for some older observations.

  • #2
    Also forgot to say I am using the old merge command at my supervisor's request, but am aware that is an odd choice and open to using the new one.

    Comment


    • #3
      Try this:
      Code:
      use C, clear
      merge 1:m firm_name_variable using A, keep(match master) nogenerate
      merge 1:m firm_name_variable using B, update
      drop if _merge == 2
      If that does not work, please postback and include example data from all three data sets, including some potential matches across them. Use the -dataex- command to do that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Added: Crossed with #2. The old and new -merge- commands do all the same things; only the syntax has changed. But the new syntax was introduced a very long time ago. I don't remember the details of the old syntax, and I don't have an old enough version of Stata on my computer to try to refresh my memory of it.
      Last edited by Clyde Schechter; 03 Oct 2023, 19:03. Reason: Add -nogenerate- to first -merge- command. The code will break without it.

      Comment


      • #4
        Sorry I may not have been clear- I have 2 datasets, the master and the using, and A and B are both company name variables in the master dataset, while C is in the using.

        Just to give more details about my datasets in case I was too vague before:

        Here is an example of the master dataset:


        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input long gvkey double fyear str30 conm_cut str32 conmhist_cut
        1000 1961 "a e plastik pak"   ""                 
        1000 1962 "a e plastik pak"   ""                 
        1000 1963 "a e plastik pak"   ""                 
        1000 1964 "a e plastik pak"   ""                 
        1000 1965 "a e plastik pak"   ""                 
        1000 1966 "a e plastik pak"   ""                 
        1000 1967 "a e plastik pak"   ""                 
        1000 1968 "a e plastik pak"   ""                 
        1000 1969 "a e plastik pak"   ""                 
        1000 1970 "a e plastik pak"   ""                 
        1000 1971 "a e plastik pak"   "a e plastik pak"  
        1000 1972 "a e plastik pak"   "a e plastik pak"  
        1000 1973 "a e plastik pak"   "a e plastik pak"  
        1000 1974 "a e plastik pak"   "a e plastik pak"  
        1000 1975 "a e plastik pak"   "a e plastik pak"  
        1000 1976 "a e plastik pak"   "a e plastik pak"  
        1000 1977 "a e plastik pak"   "a e plastik pak"  
        1001 1978 "a m food services" ""                 
        1001 1979 "a m food services" ""                 
        1001 1980 "a m food services" ""                 
        1001 1981 "a m food services" ""                 
        1001 1982 "a m food services" ""                 
        1001 1983 "a m food services" "a m food services"
        1001 1984 "a m food services" "a m food services"
        1001 1985 "a m food services" "a m food services"
        1002 1960 "aai"               ""                 
        1002 1961 "aai"               ""                 
        1002 1962 "aai"               ""                 
        1002 1963 "aai"               ""                 
        1002 1964 "aai"               ""                 
        1002 1965 "aai"               ""                 
        1002 1966 "aai"               ""                 
        1002 1967 "aai"               ""                 
        1002 1968 "aai"               ""                 
        1002 1969 "aai"               ""                 
        1002 1970 "aai"               ""                 
        1002 1971 "aai"               ""                 
        1002 1972 "aai"               ""                 
        1003 1980 "aa importing"      ""                 
        1003 1981 "aa importing"      ""                 
        1003 1982 "aa importing"      ""                 
        1003 1983 "aa importing"      ""                 
        1003 1984 "aa importing"      "a a importing"    
        1003 1985 "aa importing"      "a a importing"    
        1003 1986 "aa importing"      "a a importing"    
        1003 1987 "aa importing"      "a a importing"    
        1003 1988 "aa importing"      "a a importing"    
        1003 1989 "aa importing"      "a a importing"    
        1004 1965 "aar"               ""                 
        1004 1966 "aar"               ""                 
        1004 1967 "aar"               ""                 
        1004 1968 "aar"               ""                 
        1004 1969 "aar"               ""                 
        1004 1970 "aar"               ""                 
        1004 1971 "aar"               ""                 
        1004 1972 "aar"               "a a r"            
        1004 1973 "aar"               "a a r"            
        1004 1974 "aar"               "a a r"            
        1004 1975 "aar"               "a a r"            
        1004 1976 "aar"               "a a r"            
        1004 1977 "aar"               "a a r"            
        1004 1978 "aar"               "a a r"            
        1004 1979 "aar"               "a a r"            
        1004 1980 "aar"               "a a r"            
        1004 1981 "aar"               "a a r"            
        1004 1982 "aar"               "a a r"            
        1004 1983 "aar"               "a a r"            
        1004 1984 "aar"               "a a r"            
        1004 1985 "aar"               "a a r"            
        1004 1986 "aar"               "a a r"            
        1004 1987 "aar"               "a a r"            
        1004 1988 "aar"               "a a r"            
        1004 1989 "aar"               "a a r"            
        1004 1990 "aar"               "a a r"            
        1004 1991 "aar"               "a a r"            
        1004 1992 "aar"               "a a r"            
        1004 1993 "aar"               "a a r"            
        1004 1994 "aar"               "a a r"            
        1004 1995 "aar"               "a a r"            
        1004 1996 "aar"               "a a r"            
        1004 1997 "aar"               "a a r"            
        1004 1998 "aar"               "a a r"            
        1004 1999 "aar"               "a a r"            
        1004 2000 "aar"               "a a r"            
        1004 2001 "aar"               "a a r"            
        1004 2002 "aar"               "a a r"            
        1004 2003 "aar"               "a a r"            
        1004 2004 "aar"               "a a r"            
        1004 2005 "aar"               "a a r"            
        1004 2006 "aar"               "a a r"            
        1004 2007 "aar"               "a a r"            
        1004 2008 "aar"               "a a r"            
        1004 2009 "aar"               "a a r"            
        1004 2010 "aar"               "a a r"            
        1004 2011 "aar"               "a a r"            
        1004 2012 "aar"               "a a r"            
        1004 2013 "aar"               "a a r"            
        1004 2014 "aar"               "a a r"            
        1004 2015 "aar"               "a a r"            
        1004 2016 "aar"               "a a r"            
        end

        conm_cut is A and conmhist_cut is B. The unique identifier is the gvkey-fyear pair, not the merging variables. The above looks quite trivial as where conmhist_cut is defined it is functionally equal to conm_cut, but that is not always true and I am confident conmhist_cut often meaningfully differs. To be more clear than in my initial post, I am actually merging on conm_cut/conmhist_cut AND fyear. For the using:

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str122 conm_cut float fyear double(federal_action_obligation rnd_dollars)
        "0 base design"                 2017           99964.47 99964.46875
        "0 base design"                 2019             149947      149947
        "0 dbding"                      2004             195574      195574
        "00 enterprise"                 2003               9934           0
        "007 electronics network"       2003              50320           0
        "01db"                          2005              28976           0
        "03sl industries"               2017          199411.94           0
        "03sl industries"               2018                  0           0
        "0691734 bc"                    2004               9320           0
        "0691734 bc"                    2005               2020           0
        "0691734 bc"                    2006               3460           0
        "071777718 duluth clinic"       2002              17472           0
        "09clean"                       2014 1113476.8800000001           0
        "09clean"                       2015  2672735.139438476           0
        "09clean"                       2016    1203381.3896875           0
        "09clean"                       2017 2166392.0174267576           0
        "09clean"                       2018 2028107.6456225587           0
        "09clean"                       2019  2334114.149929199           0
        "1 acquisitions"                2018          339584.44           0
        "1 acquisitions"                2019           -4970.99           0
        "1 better"                      2017              44112           0
        "1 better"                      2018              15400           0
        "1 better"                      2019              15400           0
        "1 beyond"                      2003              29842           0
        "1 beyond"                      2004             628602           0
        "1 beyond"                      2005             262126           0
        "1 beyond"                      2006             300420           0
        "1 beyond"                      2007              25110           0
        "1 beyond"                      2008            57337.9           0
        "1 beyond"                      2009           46888.67           0
        "1 beyond"                      2010              75142           0
        "1 beyond"                      2011                  0           0
        "1 beyond"                      2012            21232.5           0
        "1 beyond"                      2014               4000           0
        "1 beyond"                      2018                  0           0
        "1 call septic service"         2018          260386.19           0
        "1 call septic service"         2019          265600.08           0
        "1 choice technology group"     2010              15488           0
        "1 choice technology group"     2011     103864.3203125           0
        "1 choice technology group"     2012           55479.04           0
        "1 choice technology group"     2013           43772.64           0
        "1 choice technology group"     2014           24086.84           0
        "1 choice technology group"     2015               3272           0
        "1 choice technology group"     2016              -3272           0
        "1 fc kaiserslautern ev"        2012           29368.58           0
        "1 industrial source"           2010     5724.919921875           0
        "1 industrial source"           2013            20486.6           0
        "1 man under pressure"          2011              54950           0
        "1 nation electronics"          2003             716030           0
        "1 nation electronics"          2004            1977854           0
        "1 nation electronics"          2005              50585           0
        "1 nation technology"           2002             330898           0
        "1 nation technology"           2003             181160           0
        "1 nation technology"           2004              91336           0
        "1 nation technology"           2005             163920           0
        "1 nation technology"           2006              95490           0
        "1 nation technology"           2007               3450           0
        "1 nation technology"           2008                  0           0
        "1 nation technology"           2010              -1738           0
        "1 network"                     2002              17666           0
        "1 network"                     2005               2476           0
        "1 network"                     2007              26631           0
        "1 network"                     2008              14592           0
        "1 network"                     2009              17140           0
        "1 network"                     2010     26023.69921875           0
        "1 network"                     2011             -17140           0
        "1 network"                     2012           15565.59           0
        "1 on 1 services group"         2006              88470           0
        "1 on 1 services group"         2009               2501           0
        "1 on 1 services group"         2013                  0           0
        "1 on 1 services group"         2014                  0           0
        "1 on 1 services group"         2015                  0           0
        "1 on 1 services group"         2016                  0           0
        "1 on 1 services group"         2017                  0           0
        "1 on 1 services group"         2018                  0           0
        "1 on 1 services group"         2019                  0           0
        "1 point procurement solutions" 2013           90919.68           0
        "1 point procurement solutions" 2014                  0           0
        "1 point procurement solutions" 2015             125.96           0
        "1 point procurement solutions" 2016             106520           0
        "1 point procurement solutions" 2017            14255.3           0
        "1 prospect technologies"       2016                  0           0
        "1 prospect technologies"       2017                  0           0
        "1 prospect technologies"       2018                  0           0
        "1 prospect technologies"       2019                  0           0
        "1 source"                      2010              16234           0
        "1 source"                      2011  178008.1820336914           0
        "1 source"                      2012             3581.9           0
        "1 source material handling"    2006              48578           0
        "1 source material handling"    2007              91257           0
        "1 source material handling"    2008           55593.62           0
        "1 source office interiors"     2005              41268           0
        "1 source office interiors"     2006              91481           0
        "1 source office interiors"     2007              45145           0
        "1 source office interiors"     2008              13237           0
        "1 source services of sc"       2009              54400           0
        "1 source services of sc"       2010                  0           0
        "1 source solutions"            2009           77219.14           0
        "1 source solutions"            2011              84150           0
        "1 source solutions"            2012 353217.83999999997           0
        end

        Ideally, I would first merge by conm_cut fyear from the master onto conm_cut fyear in the using, and then conmhist_cut fyear from the master onto conm_cut fyear in the using for unmatched using observations, and finally drop all with _merge == 2.

        Comment


        • #5
          There are no matches in your example datasets, but you want something like:

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str122 conm_cut float fyear double(federal_action_obligation rnd_dollars)
          "0 base design"                 2017           99964.47 99964.46875
          "0 base design"                 2019             149947      149947
          "0 dbding"                      2004             195574      195574
          "00 enterprise"                 2003               9934           0
          "007 electronics network"       2003              50320           0
          "01db"                          2005              28976           0
          "03sl industries"               2017          199411.94           0
          "03sl industries"               2018                  0           0
          "0691734 bc"                    2004               9320           0
          "0691734 bc"                    2005               2020           0
          "0691734 bc"                    2006               3460           0
          "071777718 duluth clinic"       2002              17472           0
          "09clean"                       2014 1113476.8800000001           0
          "09clean"                       2015  2672735.139438476           0
          "09clean"                       2016    1203381.3896875           0
          "09clean"                       2017 2166392.0174267576           0
          "09clean"                       2018 2028107.6456225587           0
          "09clean"                       2019  2334114.149929199           0
          "1 acquisitions"                2018          339584.44           0
          "1 acquisitions"                2019           -4970.99           0
          "1 better"                      2017              44112           0
          "1 better"                      2018              15400           0
          "1 better"                      2019              15400           0
          "1 beyond"                      2003              29842           0
          "1 beyond"                      2004             628602           0
          "1 beyond"                      2005             262126           0
          "1 beyond"                      2006             300420           0
          "1 beyond"                      2007              25110           0
          "1 beyond"                      2008            57337.9           0
          "1 beyond"                      2009           46888.67           0
          "1 beyond"                      2010              75142           0
          "1 beyond"                      2011                  0           0
          "1 beyond"                      2012            21232.5           0
          "1 beyond"                      2014               4000           0
          "1 beyond"                      2018                  0           0
          "1 call septic service"         2018          260386.19           0
          "1 call septic service"         2019          265600.08           0
          "1 choice technology group"     2010              15488           0
          "1 choice technology group"     2011     103864.3203125           0
          "1 choice technology group"     2012           55479.04           0
          "1 choice technology group"     2013           43772.64           0
          "1 choice technology group"     2014           24086.84           0
          "1 choice technology group"     2015               3272           0
          "1 choice technology group"     2016              -3272           0
          "1 fc kaiserslautern ev"        2012           29368.58           0
          "1 industrial source"           2010     5724.919921875           0
          "1 industrial source"           2013            20486.6           0
          "1 man under pressure"          2011              54950           0
          "1 nation electronics"          2003             716030           0
          "1 nation electronics"          2004            1977854           0
          "1 nation electronics"          2005              50585           0
          "1 nation technology"           2002             330898           0
          "1 nation technology"           2003             181160           0
          "1 nation technology"           2004              91336           0
          "1 nation technology"           2005             163920           0
          "1 nation technology"           2006              95490           0
          "1 nation technology"           2007               3450           0
          "1 nation technology"           2008                  0           0
          "1 nation technology"           2010              -1738           0
          "1 network"                     2002              17666           0
          "1 network"                     2005               2476           0
          "1 network"                     2007              26631           0
          "1 network"                     2008              14592           0
          "1 network"                     2009              17140           0
          "1 network"                     2010     26023.69921875           0
          "1 network"                     2011             -17140           0
          "1 network"                     2012           15565.59           0
          "1 on 1 services group"         2006              88470           0
          "1 on 1 services group"         2009               2501           0
          "1 on 1 services group"         2013                  0           0
          "1 on 1 services group"         2014                  0           0
          "1 on 1 services group"         2015                  0           0
          "1 on 1 services group"         2016                  0           0
          "1 on 1 services group"         2017                  0           0
          "1 on 1 services group"         2018                  0           0
          "1 on 1 services group"         2019                  0           0
          "1 point procurement solutions" 2013           90919.68           0
          "1 point procurement solutions" 2014                  0           0
          "1 point procurement solutions" 2015             125.96           0
          "1 point procurement solutions" 2016             106520           0
          "1 point procurement solutions" 2017            14255.3           0
          "1 prospect technologies"       2016                  0           0
          "1 prospect technologies"       2017                  0           0
          "1 prospect technologies"       2018                  0           0
          "1 prospect technologies"       2019                  0           0
          "1 source"                      2010              16234           0
          "1 source"                      2011  178008.1820336914           0
          "1 source"                      2012             3581.9           0
          "1 source material handling"    2006              48578           0
          "1 source material handling"    2007              91257           0
          "1 source material handling"    2008           55593.62           0
          "1 source office interiors"     2005              41268           0
          "1 source office interiors"     2006              91481           0
          "1 source office interiors"     2007              45145           0
          "1 source office interiors"     2008              13237           0
          "1 source services of sc"       2009              54400           0
          "1 source services of sc"       2010                  0           0
          "1 source solutions"            2009           77219.14           0
          "1 source solutions"            2011              84150           0
          "1 source solutions"            2012 353217.83999999997           0
          end
          
          tempfile using
          qui ds
          local usingvars `r(varlist)'
          save `using', replace
          
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input long gvkey double fyear str30 conm_cut str32 conmhist_cut
          1000 1961 "a e plastik pak"   ""                 
          1000 1962 "a e plastik pak"   ""                 
          1000 1963 "a e plastik pak"   ""                 
          1000 1964 "a e plastik pak"   ""                 
          1000 1965 "a e plastik pak"   ""                 
          1000 1966 "a e plastik pak"   ""                 
          1000 1967 "a e plastik pak"   ""                 
          1000 1968 "a e plastik pak"   ""                 
          1000 1969 "a e plastik pak"   ""                 
          1000 1970 "a e plastik pak"   ""                 
          1000 1971 "a e plastik pak"   "a e plastik pak"  
          1000 1972 "a e plastik pak"   "a e plastik pak"  
          1000 1973 "a e plastik pak"   "a e plastik pak"  
          1000 1974 "a e plastik pak"   "a e plastik pak"  
          1000 1975 "a e plastik pak"   "a e plastik pak"  
          1000 1976 "a e plastik pak"   "a e plastik pak"  
          1000 1977 "a e plastik pak"   "a e plastik pak"  
          1001 1978 "a m food services" ""                 
          1001 1979 "a m food services" ""                 
          1001 1980 "a m food services" ""                 
          1001 1981 "a m food services" ""                 
          1001 1982 "a m food services" ""                 
          1001 1983 "a m food services" "a m food services"
          1001 1984 "a m food services" "a m food services"
          1001 1985 "a m food services" "a m food services"
          1002 1960 "aai"               ""                 
          1002 1961 "aai"               ""                 
          1002 1962 "aai"               ""                 
          1002 1963 "aai"               ""                 
          1002 1964 "aai"               ""                 
          1002 1965 "aai"               ""                 
          1002 1966 "aai"               ""                 
          1002 1967 "aai"               ""                 
          1002 1968 "aai"               ""                 
          1002 1969 "aai"               ""                 
          1002 1970 "aai"               ""                 
          1002 1971 "aai"               ""                 
          1002 1972 "aai"               ""                 
          1003 1980 "aa importing"      ""                 
          1003 1981 "aa importing"      ""                 
          1003 1982 "aa importing"      ""                 
          1003 1983 "aa importing"      ""                 
          1003 1984 "aa importing"      "a a importing"    
          1003 1985 "aa importing"      "a a importing"    
          1003 1986 "aa importing"      "a a importing"    
          1003 1987 "aa importing"      "a a importing"    
          1003 1988 "aa importing"      "a a importing"    
          1003 1989 "aa importing"      "a a importing"    
          1004 1965 "aar"               ""                 
          1004 1966 "aar"               ""                 
          1004 1967 "aar"               ""                 
          1004 1968 "aar"               ""                 
          1004 1969 "aar"               ""                 
          1004 1970 "aar"               ""                 
          1004 1971 "aar"               ""                 
          1004 1972 "aar"               "a a r"            
          1004 1973 "aar"               "a a r"            
          1004 1974 "aar"               "a a r"            
          1004 1975 "aar"               "a a r"            
          1004 1976 "aar"               "a a r"            
          1004 1977 "aar"               "a a r"            
          1004 1978 "aar"               "a a r"            
          1004 1979 "aar"               "a a r"            
          1004 1980 "aar"               "a a r"            
          1004 1981 "aar"               "a a r"            
          1004 1982 "aar"               "a a r"            
          1004 1983 "aar"               "a a r"            
          1004 1984 "aar"               "a a r"            
          1004 1985 "aar"               "a a r"            
          1004 1986 "aar"               "a a r"            
          1004 1987 "aar"               "a a r"            
          1004 1988 "aar"               "a a r"            
          1004 1989 "aar"               "a a r"            
          1004 1990 "aar"               "a a r"            
          1004 1991 "aar"               "a a r"            
          1004 1992 "aar"               "a a r"            
          1004 1993 "aar"               "a a r"            
          1004 1994 "aar"               "a a r"            
          1004 1995 "aar"               "a a r"            
          1004 1996 "aar"               "a a r"            
          1004 1997 "aar"               "a a r"            
          1004 1998 "aar"               "a a r"            
          1004 1999 "aar"               "a a r"            
          1004 2000 "aar"               "a a r"            
          1004 2001 "aar"               "a a r"            
          1004 2002 "aar"               "a a r"            
          1004 2003 "aar"               "a a r"            
          1004 2004 "aar"               "a a r"            
          1004 2005 "aar"               "a a r"            
          1004 2006 "aar"               "a a r"            
          1004 2007 "aar"               "a a r"            
          1004 2008 "aar"               "a a r"            
          1004 2009 "aar"               "a a r"            
          1004 2010 "aar"               "a a r"            
          1004 2011 "aar"               "a a r"            
          1004 2012 "aar"               "a a r"            
          1004 2013 "aar"               "a a r"            
          1004 2014 "aar"               "a a r"            
          1004 2015 "aar"               "a a r"            
          1004 2016 "aar"               "a a r"            
          end
          tempfile master
          save `master', replace
          
          merge 1:1 conm_cut fyear using `using', keep(using)
          keep `usingvars'
          rename conm_cut conmhist_cut2
          tempfile using2
          save `using2', replace
          
          use `master', clear
          merge 1:1 conm_cut fyear using `using', keep(master match) nogen
          gen conmhist_cut2= cond(missing(conmhist_cut),conm_cut, conmhist_cut)
          merge 1:1 conmhist_cut2 fyear using `using2', keep(master match)

          Comment


          • #6
            Sorry I may not have been clear- I have 2 datasets, the master and the using, and A and B are both company name variables in the master dataset, while C is in the using.
            What is master and what is using is, to a great extent, arbitrary.
            Code:
            use dataset1, clear
            merge i:j varlist using dataset2 // i, j being 1 or m as the case may be
            
            // IS EQUIVALENT TO
            
            use dataset2, clear
            merge j:i varlist using dataset1 // note reversal of order of i, j
            The resulting data sets will be the same, except for which observations have _merge == 1 and which have _merge == 2: those will be interchanged. (Which is why my code ends by dropping -if _merge = 1-.

            There is a circumstance, however, where it matters which is which. And that is the reason I proposed in #3 to have C be the master data set. When you specify an update merge, non-missing values in the master data set are always left as they are, but missing values in the master set are replaced by non-missing values from observations in the using data set. This is asymmetrical, and I was capitalizing on that in the second -merge- command.

            With a little bit of extra code to rename conumhist_cut to conumhist in data set B, and making the merge key conumhist_cut fyear instead of just the company name variable, you will find that the approach in #3 does what you want.

            Comment


            • #7
              Thanks so much for the help

              Comment

              Working...
              X