Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • resource for learning regular expressions

    Hi all,

    One thing i've just had to pick up by doing is regular expression matching. However, it'd be great to review a resource with a bit more of a systematic explanation for possible inputs.

    E.g. [0-9] and [A-Z] are quite self explanatory, \(.+?\) is not. I hope this question is clear


  • #2
    Search the web for "regular expression tutorial" - there are numerous sites with reasonable introductions.

    Comment


    • #3
      Thanks. I was looking for something Stata specific. I know regular expression syntax can differ across languages so figured Stata must've put something together (especially as many of their users do not in fact code in other languages). If not, then i'll just do as you say

      Comment


      • #4
        This is a nice one: http://www.stata.com/support/faqs/da...r-expressions/

        As is this: https://stats.idre.ucla.edu/stata/fa...r-expressions/

        I would've thought something like the first could be found via help regular_expressions

        Comment


        • #5
          Ah, well, Stata's implementation of regular expressions has always been notoriously under-documented.

          It is complicated by the fact that Stata 14 introduced additional regular expression functions that can handle Unicode character strings, and they also provide regular expanded expression syntax. But StataCorp did not expand the documentation.

          The link below is to post #16 in a thread that largely deals with an arcane comparison between Java and Stata 14 implementations of regular expressions. In post #16, Hua Peng from StataCorp weighs in with a valiant attempt to briefly explain the differences. And he links to a Wikipedia page that compares regular expression engines, including the engine used by Stata's Unicode regular expression functions, so you can see what Stata's newest functions have and lack.

          http://www.statalist.org/forums/foru...79#post1327779

          Note that the two FAQs you cite in post #4 above pertain to the more limited pre-Stata 14 regular expressions; the Unicode-capable ones are more capable.

          I will also note that in post #14 in the aforecited thread, Robert Picard did me the great favor of reassuring me that for ASCII characters in the range of 0-127 (base 10) the Unicode-capable functions will work as intended and provide the additional expression capabilities.

          http://www.statalist.org/forums/foru...73#post1327773

          Finally, my web search found the following site, to which Robert in an earlier post had referred as well, that is a good source of regular expression tutorial information. Reading though it, with Stata open for the trial-and-error experimentation the site assumes you will do to follow along, might supply the approach you sought in post #1 above.

          http://www.regular-expressions.info

          Comment


          • #6
            Bjarte Aagnes has provided just now, in another topic, the following link to definitive, comprehensive documentation for the ICU regular expression syntax, upon which the Stata's Unicode-capable regular expression functions discussed above are based.

            http://userguide.icu-project.org/strings/regexp

            In general, if you're new to regular expressions, and are using Stata 14 or later, I strongly recommend bypassing the older regular expression functions, for which the regular expression syntax is not documented in Stata, and instead use the Unicode-capable regular expression functions, for which the regular expression syntax is well-documented at the link given, although again not documented in Stata, as of Stata 15.0.

            Comment


            • #7
              Thanks Will. This looks great -- thanks for remembering my thread!

              Comment

              Working...
              X