Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Efficient ways to identify similar string variables

    Hello,
    I am trying to identify and consolidate addresses that are technically the same, but have various string definitions.

    For example:
    '0105 Edwards Village Center A203'
    '0105 Edwards Village Blvd.'
    '0105 Edwards Village Blvd Suite A203'
    '0105 Edwards Village Ctr A-203'
    '0105 Edwards Village Ctr Ste A-203'

    These addresses are the same but different in 'string'.
    I need to collapse the addresses to '0105 Edwards Village Blvd' etc.

    Is there a way to identify variables that share multiple string attributes?

    Thank you.

  • #2
    There is no perfect solution for this kind of thing, but fuzzy matching can help. Search the forum for examples and implementations of this.

    Comment


    • #3
      The set of tools you will find useful will depend on the precise types of variations you have to consolidate. Fuzzy matching, as suggested in #2, is one way to go. From the example you have shown, I think you will also get quite far by using some basic cleaning techniques:
      • convert strings to lower() case, and trim()and itrim() them to remove unnecessary spaces
      • use subinstr() to remove things like hyphens and periods
      • standardize acronyms: loop over subinstr() statements to standardise Ste/Suite, Ctr/Center, and so on
      Last edited by Hemanshu Kumar; 05 Dec 2022, 02:21.

      Comment


      • #4
        Excellent! Thank you both for your suggestions.

        Comment

        Working...
        X