Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem replacing string values before matchit

    Hello,

    I'm cleaning data in preparation for a fuzzy merge/matchit. I will be matching based on police academy names, stored in the variable 'Academy'. In one datafile, the academy name ends with either the full word "academy" or an abbreviated "acad". I'm trying to change "acad" to "academy", but my code is resulting in "academy" also changing to "academyemy". Could someone please take a look at my code below and let me know what the issue is? Thanks very much for your time.


    replace Academy = subinstr(Academy, " acad", " academy", .) if strpos((Academy), "academy") != 1

  • #2
    Well if Academy == "academy", there is indeed the substring "acad" in initial position in Academy. Consequently the -replace- is carried out, and "academy" becomes "academyemy".

    Now, I do not doubt that somebody who is fluent in regular expressions can find a one-line solution to your problem. But here's how I, a person who simply does not grasp regular expression syntax, would solve your problem. Leave your command as is, and just follow it with:
    Code:
    replace Academy = subinstr(Academy, "academyemy", "academy", .)
    Sometimes it is simpler to overdo something and then undo the excess.

    Comment


    • #3
      Regex might as well be written in Sanskrit

      Comment


      • #4
        Clyde Schechter great point. Thanks a lot for your response.

        Comment


        • #5
          I messed around and fixed the issue. Now it's working.

          replace Academy = subinstr(Academy, " acad", " academy", .) if strpos(Academy, "academy") == 0

          Comment


          • #6
            For the regular expression fans reading this.
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input str25 school
            "dog academy"          
            "cat acad"
            "cow acad of cowarado"            
            "lacadaisical adademy"
            "the academic acad"
            end
            clonevar school1 = school
            replace school1 = subinstr(school1, " acad", " academy", .) if strpos(school1, "academy") == 0
            clonevar school2 = school
            replace school2 = ustrregexra(school2,"\bacad\b","academy")
            list, clean noobs
            Code:
            . list, clean noobs
            
                               school                   school1                   school2  
                          dog academy               dog academy               dog academy  
                             cat acad               cat academy               cat academy  
                 cow acad of cowarado   cow academy of cowarado   cow academy of cowarado  
                 lacadaisical adademy      lacadaisical adademy      lacadaisical adademy  
                    the academic acad   the academyemic academy      the academic academy
            Last edited by William Lisowski; 10 Jul 2022, 13:48.

            Comment

            Working...
            X