Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • string match function on multiple variables

    Hello friends,

    I am trying to match strings with whole words/names that occur in two or more selected variables as shown in the table. I know commands like the regexm, strmatch and strops could match within a variable. matching across variable seems confusing to me. Thanks.
    Seller Buyer Price Year
    Salamander John Drummond Patricia 581,300 1999
    Drummond Patricia Davies & Sons 89,560 2005
    Drexwell Peter Fowler Peter 255,200 2003
    Faraday Louise Hardley Guy 165,240 1997
    Peterside Ltd Arundel James 98,200 2008
    Arundel James Boxer Jasmyn 354,000 2006

  • #2
    This seems incomplete. Please explain what you want and show the code you tried.

    Comment


    • #3
      Many thanks Nick. I want to match say 'Drummond Patricia' and many other names that appear in both the seller and buyer. I'm afraid I do not know any code that would help me with matching across variables like the ISNA function in excel. In this situation, I cannot specify a particular string as there are numerous strings to match.

      Comment


      • #4
        Sorry, but in turn I am lost. I don't use Excel routinely and I don't think you have explained what matching means here.

        Code:
         
        list if strpos(X, "frog") & strpos(Y, "toad")
        would select observations containing "frog" within X and "toad" within Y, if that helps.

        Comment


        • #5
          Thanks Nick. I think what I need here is the use of 'duplicates' command that shows me the duplicated names across the two variables/columns

          Comment


          • #6
            I understand you to want: "Identify all observations in which a string in one variable (Seller) also occurs in another variable (Buyer)." Presuming that as correct, I'd suggest putting both the Buyer and Seller variables into one variable, and checking for duplicates. One way to do that involves using -stack-:
            Code:
            clear
            input str25 Seller str25 Buyer Price Year
               "Salamander John"  "Drummond Patricia"  581300  1999
               "Drummond Patricia"  "Davies & Sons"  89560  2005
               "Drexwell Peter"  "Fowler Peter"  255200  2003
               "Faraday Louise"  "Hardley Guy"  165240  1997
               "Peterside Ltd"  "Arundel James"  98200  2008
               "Arundel James"  "Boxer Jasmyn"  354000  2006
            end
            //
            preserve
            stack Buyer Seller, into(BuyerOrSeller) clear
            bysort BuyerOrSeller: keep if (_N > 1) & (!missing(Seller)) //Sellers who also appear as buyers
            keep Seller
            gen byte hasdupe = 1
            tempfile temp
            save `temp'
            restore
            //
            merge 1:1 Seller using `temp', keepusing(hasdupe) // keep(3) nogen
            list
            Regards, Mike

            Comment


            • #7
              Assuming Mike has correctly interpreted what is wanted, another approach is:

              Code:
              levelsof Buyer, local(buyers)
              levelsof Seller, local(sellers)
              
              local both_buyer_and_seller: list buyers & sellers
              
              display `"`both_buyer_and_seller'"'
              Now, that code won't put any new variables into your data set. It only generates a list of names that occur as both buyer and seller and displays it on the screen. But if you want, for example, to create a new variable, call it cross_match, that indicates any observation where the buyer or seller also occurs in the opposite role, you could follow-up the above with:

              Code:
              gen byte cross_match = 0
              foreach x of local both_buyer_and_seller {
                  replace cross_match = 1 if inlist(`x', Buyer, Seller)
              }

              Comment


              • #8
                Eben,

                Since your buyer and seller names have spaces, you may want to have double quotes around x. Clyde's updated code would be:

                Code:
                clear
                input str25 Seller str25 Buyer Price Year
                   "Salamander John"  "Drummond Patricia"  581300  1999
                   "Drummond Patricia"  "Davies & Sons"  89560  2005
                   "Drexwell Peter"  "Fowler Peter"  255200  2003
                   "Faraday Louise"  "Hardley Guy"  165240  1997
                   "Peterside Ltd"  "Arundel James"  98200  2008
                   "Arundel James"  "Boxer Jasmyn"  354000  2006
                end
                
                levelsof Buyer, local(buyers)
                levelsof Seller, local(sellers)
                
                local both_buyer_and_seller: list buyers & sellers
                
                display `"`both_buyer_and_seller'"'
                
                gen byte cross_match = 0
                foreach x of local both_buyer_and_seller {
                    replace cross_match = 1 if inlist("`x'", Buyer, Seller)
                }

                Comment

                Working...
                X