resource for learning regular expressions

Chris Larkin

Join Date: Apr 2016

Posts: 296
#1

resource for learning regular expressions

17 Jun 2017, 10:14

Hi all,

One thing i've just had to pick up by doing is regular expression matching. However, it'd be great to review a resource with a bit more of a systematic explanation for possible inputs.

E.g. [0-9] and [A-Z] are quite self explanatory, \(.+?\) is not. I hope this question is clear
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

17 Jun 2017, 16:23

Search the web for "regular expression tutorial" - there are numerous sites with reasonable introductions.
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#3

17 Jun 2017, 19:52

Thanks. I was looking for something Stata specific. I know regular expression syntax can differ across languages so figured Stata must've put something together (especially as many of their users do not in fact code in other languages). If not, then i'll just do as you say
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#4

17 Jun 2017, 19:56

This is a nice one: http://www.stata.com/support/faqs/da...r-expressions/

As is this: https://stats.idre.ucla.edu/stata/fa...r-expressions/

I would've thought something like the first could be found via help regular_expressions
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

18 Jun 2017, 11:05

Ah, well, Stata's implementation of regular expressions has always been notoriously under-documented.

It is complicated by the fact that Stata 14 introduced additional regular expression functions that can handle Unicode character strings, and they also provide regular expanded expression syntax. But StataCorp did not expand the documentation.

The link below is to post #16 in a thread that largely deals with an arcane comparison between Java and Stata 14 implementations of regular expressions. In post #16, Hua Peng from StataCorp weighs in with a valiant attempt to briefly explain the differences. And he links to a Wikipedia page that compares regular expression engines, including the engine used by Stata's Unicode regular expression functions, so you can see what Stata's newest functions have and lack.

http://www.statalist.org/forums/foru...79#post1327779

Note that the two FAQs you cite in post #4 above pertain to the more limited pre-Stata 14 regular expressions; the Unicode-capable ones are more capable.

I will also note that in post #14 in the aforecited thread, Robert Picard did me the great favor of reassuring me that for ASCII characters in the range of 0-127 (base 10) the Unicode-capable functions will work as intended and provide the additional expression capabilities.

http://www.statalist.org/forums/foru...73#post1327773

Finally, my web search found the following site, to which Robert in an earlier post had referred as well, that is a good source of regular expression tutorial information. Reading though it, with Stata open for the trial-and-error experimentation the site assumes you will do to follow along, might supply the approach you sought in post #1 above.

http://www.regular-expressions.info
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

30 Sep 2017, 10:11

Bjarte Aagnes has provided just now, in another topic, the following link to definitive, comprehensive documentation for the ICU regular expression syntax, upon which the Stata's Unicode-capable regular expression functions discussed above are based.

http://userguide.icu-project.org/strings/regexp

In general, if you're new to regular expressions, and are using Stata 14 or later, I strongly recommend bypassing the older regular expression functions, for which the regular expression syntax is not documented in Stata, and instead use the Unicode-capable regular expression functions, for which the regular expression syntax is well-documented at the link given, although again not documented in Stata, as of Stata 15.0.
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#7

03 Oct 2017, 13:24

Thanks Will. This looks great -- thanks for remembering my thread!
Comment

Announcement

resource for learning regular expressions

Comment

Comment

Comment

Comment

Comment

Comment