Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regex Command not working; "regexp: ?+* follows nothing"

    Hi,
    I started to clean company names in the datastet I am currently working on and I am trying to get rid of the legal suffixes at the end of the string.
    For instance:
    1ST Biotherapeutics, Inc.
    I used

    replace name = regexr(name, "(?<=\,\s)(\w*.*)", "")
    However I get the error
    regexp: ?+* follows nothing
    .

    If I run the regex code for instance on "https://regex101.com/" where you can test regex queries it correctly selects "Inc."

    Can someone point out where I am wrong here?

    Thanks,

    Marcel
    Last edited by Marcel Wieting; 30 Oct 2020, 16:56.

  • #2
    Here is an example that demonstrates your problem and a solution.
    Code:
    . set obs 1
    number of observations (_N) was 0, now 1
    
    . generate str30 name1 = "1ST Biotherapeutics, Inc."
    
    . generate str30 name2 =  regexr(name1, "(?<=\,\s)(\w*.*)", "")
    regexp: ?+* follows nothing
    
    . generate str30 name3 =  ustrregexra(name1, "(?<=\,\s)(\w*.*)", "")
    
    . list, clean noobs
    
                            name1                       name2                   name3  
        1ST Biotherapeutics, Inc.   1ST Biotherapeutics, Inc.   1ST Biotherapeutics,
    Stata's original regular expression functions, of which stregerr() is an example, implemented an older, more limited version of regular expression syntax.

    The Unicode regular expression functions introduced in Stata 14, of which ustrregexra() is an example, have a much more powerful definition of regular expressions than the non-Unicode functions. To the best of my knowledge, only in the Statlist post linked here is it documented that Stata's Unicode regular expression parser is the ICU regular expression engine documented at http://userguide.icu-project.org/strings/regexp. A comprehensive discussion of regular expressions can be found at https://www.regular-expressions.info/unicode.html.

    Comment


    • #3
      Excellent, William. Spot on. Thanks!

      Comment

      Working...
      X