Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regex: What does " [^ ]+$" mean?

    I read a message from the old Stata listserver that there was a way to remove the last word of a string using regex. (The words are separated by whitespaces.) The code for this is:
    Code:
    regexr(string, " [^ ]+$", "")
    What does the
    Code:
    " [^ ]+$"
    mean? How does this get the last word in a string? In particular, I don't understand what it means for the anchor character ^ to be inside the square brackets, which denote the set of allowable characters.

  • #2
    The regex pattern " [^ ]+$" means: match a sequence of characters that does not contain a space at the end of a string. The pattern starts with a space, followed by a character class "[^ ]" which matches any character that is not a space, and the "+" symbol means to match one or more of these non-space characters. Finally, the "$" asserts that the match should occur at the end of the string.

    Comment


    • #3
      The choice of using the caret (^) symbol for negation within brackets in regular expressions stems from the desire to use a symbol that wasn’t already widely used for other purposes, like an exclamation point or NOT.

      Overloaded notation occurs when the same symbol or notation is used to represent multiple meanings or functions within a specific context. Here that's anchoring and negation depending on where the caret is used and the presence of brackets.

      Comment


      • #4
        Thanks! This is very helpful to know. For some reason, this doesn't seem to be documented on the Stata website.

        Comment

        Working...
        X