Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Errors with regexr and regexm

    When I attempt the following:
    replace rating = regexr(rating,"(","")

    I get the following error:
    regexp: unterminated ()

    This appears to have happened twice, in 2009 and 2011. In 2009 the advice was to contact Stata. In 2011, the thought was that the coder's expression was too long.

    I get the same error if I try to use regexm.

    Any thoughts? Thanks in advance.

  • #2
    I can't see your references and don't recall the posts in question but I have a different answer. You should try compound quotation marks.

    Code:
    replace rating = regexr(rating,`"(",""')
    However, I would not overwrite the raw material. If you get the syntax wrong, you may well have to read in the data again. generate a new variable and check that it is what you want.

    Whether the regular expression is right I can't comment on.

    EDIT: The suggestion of compound quotes and the ensuing code are pure unadulterated garbage. The rest is anodyne but correct. Please see #5.
    Last edited by Nick Cox; 02 Oct 2018, 10:10.

    Comment


    • #3
      Thank you for the suggestion. Unfortunately I get the same error.

      (Point taken on not overwriting. I generate a copy before this command. Inefficient, yes.)

      2011:
      https://www.stata.com/statalist/arch.../msg00235.html
      2009:
      http://statalist.1588530.n2.nabble.c...td2638391.html

      Comment


      • #4

        Parentheses has special meaning in a regular expression (grouping part of a regular expression together and create a numbered capturing group). You cannot refer to the opening parenthesis as a litteral character. You can escaping a "(" with a backslash "\(", or put it inside a character class "[(]"

        To match a "(" use

        Code:
        di regexr("te(xt","[(]","")
        di regexr("te(xt","\(","")
        Last edited by Bjarte Aagnes; 02 Oct 2018, 10:03.

        Comment


        • #5
          My code in #2 is complete nonsense. Sorry about that. It seems that the problem is zapping left parentheses. If so,

          Code:
          gen new = subinstr(old, "(", "", .)
          is a direct way to do it.

          Comment


          • #6
            Thank you both for the suggestions. Nick, your method will get me where I need to be. Not as efficient, but I've already shown the inefficiencies in my code!

            Bjarte, it may not be a parentheses issue.

            Code:
            replace rating = regexr(rating,"+"," ")
            Generates:
            regexp: ?+* follows nothing

            Comment


            • #7
              This is a humorous reply. Both "(" and "+" are special characters needing escaping.
              Code:
              di regexr("te(xt","[(]","")
              di regexr("te(xt","\(","")
              di regexr("te+xt","[+]","")
              di regexr("te+xt","\+","")

              Comment

              Working...
              X