Dear Statalisters,
I am trying to use regular expressions to remove certain words/phrases (always in brackets) from long strings. Here is some example data.
I've used code that I found on this site that I can use to remove all text contained in brackets:
However, I'd only like to remove some of the text in brackets. In the example data, I would like to remove "[xxx stands]" and "[Clapping]", but keep "[Speaker of the House ends the meeting]".
I haven't figured out the proper syntax in regular expressions. The following code seeks to remove all instances of "[Clapping]", but it ends up cutting out all the preceding text:
Can anyone help with the proper way to set this up using regular expressions so I can specify words or word fragments within brackets?
Thanks!
I am trying to use regular expressions to remove certain words/phrases (always in brackets) from long strings. Here is some example data.
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str20 speaker str152 statement "Speaker 1" "[Speaker 1 stands] The Secretary of State knows that the cost of food has gotten much higher under the current government. [Clapping] What will you do? " "Speaker 2" "As our Secretary has said, the next meeting will focus specifically on the issue of food security. [Speaker of the House ends the meeting]" "Speaker of the House" "We will reconvene tomorrow morning. [Speaker of the House stands]" end
I've used code that I found on this site that I can use to remove all text contained in brackets:
Code:
gen clean = ustrregexra(statement,"\[.+?\]","")
I haven't figured out the proper syntax in regular expressions. The following code seeks to remove all instances of "[Clapping]", but it ends up cutting out all the preceding text:
Code:
gen clean = ustrregexra(statement,"^\[.+?\Clapping]","")
Can anyone help with the proper way to set this up using regular expressions so I can specify words or word fragments within brackets?
Thanks!
Comment