Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating regular expression that would replace a specific character located in a specific place

    Hi,

    I have a data set corresponding to the extract below.
    Code:
    clear
    input str9 var1
    "A11OAB"
    "B22OBD"
    "CC21OAA "
    end
    I'm looking for a regular expression that would replace O with 0 across the observations that are meeting the following criteria:
    • Are 6 characters long
    • Start with a letter
    • End with two letters
    • Counting from left, O is fourth character
    Kind regards,
    Konrad
    Version: Stata/IC 13.1

  • #2
    Code:
    generate byte flag = ///
        regexm(var1, "^[a-zA-z]..[O][a-zA-Z][a-zA-Z]$")
    Best
    Daniel

    Comment


    • #3
      Thank you very much.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        How do you replace the O with 0? Like this?

        Code:
        gen newvar1 = substr(var1,1,3)+"0"+substr(var1,5,2) if flag
        In SAS you could write
        Code:
        ​substr(var1,4,1) = "0"
        Is there a similar solution in Stata? I know that I could use subinstr(), but it could replace other O's coming before and after the fourth position.

        Moreover it does not seem possible to request repeating patterns with Stata regex. I would like to write something like

        Code:
        ​generate byte flag = ///
            regexm(var1, "^[a-zA-z]..[O][a-zA-Z]{2}$")
        It should be possible if Stata is following the Posix.2 standard, shouldn't it?

        Comment


        • #5
          Your first syntax is what I had in mind. I am not aware of an official Stata command or function that matches your SAS code.

          Regarding the last point, the apparent lack of proper documentation of Stata's regex machinery has been discussed on the list repeatedly. It seems there is merely no complete answer. With Stata 14's unicode features, what you want seems to be possible - at least there is an example in the respective help file.

          Best
          Daniel

          Comment


          • #6
            I don't like the new names of the unicode regex functions. Here's an example that shows a repeating pattern

            Code:
            clear
            input str9 var1
            "A11OAB"
            "B22OBD"
            "CC21OAA "
            end
            
            clonevar vcopy = var1
            replace var1 = ustrregexs(1) + "0" + ustrregexs(2) if ///
                ustrregexm(var1,"^([a-zA-z]..)O([a-zA-Z]{2})$")
                
            list if vcopy != var1

            Comment

            Working...
            X