Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find and Replace Regular Expression -- How to Reference "Found" Selection in Replace Field?

    Hi guys.

    Not many people seem to use Regular Expressions in Stata, particularly the search+replace in Do File editor application.

    The flavor of Stata's Regular Expressions is not totally standard which can be a little frustrating. But one weakness I can't seem to find in documentation is how the Find+Replace is able to reference the found selection. Please note: I AM NOT talking about the regex*() commands that exist for dataset filtering, i am in the context of find and replace in dofile editor where you check the box "Regular expression" and can make more advanced finding algorithms in program code. Allow me to demonstrate with an example.

    Let's say the contents of the dofile is:

    Code:
    foo(test 123)
    bar(456 test)
    I use the regular expression code in "Find what" field:

    Code:
    foo([^)]*?)
    This finds all instances of foo(*) where * = all spaces, characters, numbers.

    In other implementations of RegEx find and replace (Notepad++ for example) you can use:

    Code:
    $0
    as a representation of whatever is "found" in the find+replace diagogue. So if you were to make "Replace with" field:

    Code:
    /* $0 */
    Then the dofile's contents would then look like:

    Code:
    /* foo(test 123) */
    bar(456 test)
    But Stata's find+replace regex does not recognize $0.

    I can't find in the documentation what is the way to reference the found elements. Does anyone know if this is even possible in Stata?

  • #2
    I may be missing the point, but is there a reason you would not want to put
    Code:
    /* foo([^]*?) */
    in the "Replace what:" field?

    Comment


    • #3
      Originally posted by Eric Haavind-Berman View Post
      I may be missing the point, but is there a reason you would not want to put
      Code:
      /* foo([^]*?) */
      in the "Replace what:" field?
      Yes, it doesn't work. Doing so replaces the selection with the literal code of "/* foo([^]*?) */" , not /* <selection found> */

      Comment


      • #4
        I figured it out with trial and error and using proper documented regex find and replace from other languages/programs.

        Sometimes it is $0, sometimes it is ^0, and other times it is \0

        In the case of Stata Find+Replace regular expressions, you need to use "\0" to reference the found selection.

        Thus when you do

        Find what: foo([^)]*?)
        Replace with: /* \0 */

        then it will actually replace with the found selection rather than the literal code.

        It is very frustrating that Stata does not properly document this. Hopefully others will benefit from this thread.

        Comment


        • #5
          Thank you, Chris, for closing the loop. If it's any consolation, I raised my concerns with Stata's documentation of their regular expression capabilities back when my count of posts was similar to your current count of posts.

          Let me add that this feature of the Do-File editor is apparently available in Stata for Windows (the reference to Notepad++ suggests that's the environment you're on) but not in Stata for Mac (the environment I'm on). (That's OK, though, because I do all my serious do-file editing in BBEdit.)

          It is possible that the regular expression engine behind the Do-File editor is the enhanced one (used by e.g. uregexm) that deals with Unicode character sets (and thus with ASCII as a proper subset) and which presents a more complete regular expression syntax that the engine behind regxm . My memory tells me that with the introduction of the Unicode string functions in Stata 14 I saw a more complete documentation of the regular expression syntax, or perhaps a link to a description external to Stata, but nowhere can I find it at the moment. While I evangelize regular expressions on Statalist, perhaps one of the actual experts here will look in and help us out.

          Comment


          • #6
            Got it. Here's post #16 in an earlier topic with info about the regular expression engine used with the Unicode functions.

            http://www.statalist.org/forums/foru...79#post1327779

            Comment


            • #7
              It may be that the do-file editor is not using Stata's regular expression implementations. The do-file editor is using portions of Scintilla and SciTE; possibly including the Scintilla's implementation of regular expressions:

              http://www.scintilla.org/ScintillaDoc.html
              http://www.scintilla.org/SciTERegEx.html

              Comment

              Working...
              X