Find and Replace all text within brackets?

Al Adams

Join Date: Apr 2020

Posts: 54
#1

Find and Replace all text within brackets?

19 Jul 2020, 22:05

Hi, I am working with a panel dataset whose variable names are very different for every wave of the survey. In order to give the variables some meaningful names, I must rename each individual variable. To do this, I typically copy and paste the list of variables from the dataset's website into the do-file, where I then use a loop function to rename them. However, when I paste them, they appear like this

Code:

[[68]V181 [69]V801 [70]V1490 [71]V2202 [72]V2828 [73]V3300 [74]V3720 [75]V4204 [76]V5096 [77]V5662 [78]V6209 [79]V6802 [80]V7447 [81]V8099 [82]V8723 [83]V9408 [84]V11055 [85]V11938 [86]V13565 [87]V14612 [88]V16086 [89]V17483 [90]V18814 [91]V20114 [92]V21420 [93]V23276 [94]ER3944 [95]ER6814 [96]ER9060 [97]ER11848 [99]ER15928 [01]ER19989 [03]ER23426 [05]ER27393 [07]ER40565 [09]ER46543 [11]ER51904 [13]ER57659 [15]ER64810 [17]ER70882

The variable above is the variable for a person's income. The number in the bracket is the year corresponding to the measure. For example, V181 is a person's income in 1968. V801 is a person's income in 1969, etc.

I'd like to delete the brackets and everything in them, so that the pasted variable names then look like this:

Code:

V181 V801 V1490 V2202 V2828 V3300 V3720 V4204 V5096 V5662 V6209 V6802 V7447 V8099 V8723 V9408 V11055 V11938 V13565 V14612 V16086 V17483 V18814 V20114 V21420 V23276 ER3944 ER6814 ER9060 ER11848 ER15928 ER19989 ER23426 ER27393 ER40565 ER46543 ER51904 ER57659 ER64810 ER70882

Is there an easier way to do this besides deleting the brackets and their contents one by one?
Tags: None

1 like

Arthur Morris

Join Date: Apr 2014
Posts: 107

19 Jul 2020, 23:36

Based on your description, I suspect there may be a better way for you to extract the dataset you're working with.
However, it seems like you question is about how to remove strings of the pattern "[\d\d]", rather than what you describe in the title of the thread. You can do this a number of ways, see help string functions for details on these. I'll illustrate a simple one below.

Code:

// put the long string in a local
local longstring " [68]V181 [69]V801 [70]V1490 [71]V2202 [72]V2828 [73]V3300 [74]V3720 [75]V4204 [76]V5096 [77]V5662 [78]V6209 [79]V6802 [80]V7447 [81]V8099 [82]V8723 [83]V9408 [84]V11055 [85]V11938 [86]V13565 [87]V14612 [88]V16086 [89]V17483 [90]V18814 [91]V20114 [92]V21420 [93]V23276 [94]ER3944 [95]ER6814 [96]ER9060 [97]ER11848 [99]ER15928 [01]ER19989 [03]ER23426 [05]ER27393 [07]ER40565 [09]ER46543 [11]ER51904 [13]ER57659 [15]ER64810 [17]ER70882 "

// loop over the longstrings
foreach shortstring of local longstring {
    display "The raw string is `shortstring'"
    local cleanshortstring = substr("`shortstring'", 5,.) // you can use another string command here to meet your needs
    display "The clean string is `cleanshortstring'" // now you can do whatever you like with the clean string
}

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10489

19 Jul 2020, 23:58

Code:

local varnames "[[68]V181 [69]V801 [70]V1490 [71]V2202 [72]V2828 [73]V3300 [74]V3720 [75]V4204 [76]V5096 [77]V5662 [78]V6209 [79]V6802 [80]V7447 [81]V8099 [82]V8723 [83]V9408 [84]V11055 [85]V11938 [86]V13565 [87]V14612 [88]V16086 [89]V17483 [90]V18814 [91]V20114 [92]V21420 [93]V23276 [94]ER3944 [95]ER6814 [96]ER9060 [97]ER11848 [99]ER15928 [01]ER19989 [03]ER23426 [05]ER27393 [07]ER40565 [09]ER46543 [11]ER51904 [13]ER57659 [15]ER64810 [17]ER70882"
local varnames= ustrregexra("`varnames'", "(\[[\[0-9]+\])", "",.)
di "`varnames'"

Res.:

Code:

. 
. di "`varnames'"
V181 V801 V1490 V2202 V2828 V3300 V3720 V4204 V5096 V5662 V6209 V6802 V7447 V8099 V8723 V9408 V11055 V11938 V13565 V14612 V16086 V17483 V18814 V20114 V21420 V23276 ER3944 E
> R6814 ER9060 ER11848 ER15928 ER19989 ER23426 ER27393 ER40565 ER46543 ER51904 ER57659 ER64810 ER70882

Comment

Bjarte Aagnes

Join Date: Apr 2014

Posts: 789
#4

20 Jul 2020, 10:58

in #3

Code:

ustrregexra("`varnames'", "(\[[\[0-9]+\])", "",.)

The last argument (setting case-insensitive matching) is not needed (and will probably be less efficient). A shorter, and faster, alternative is

Code:

ustrregexra("`varnames'", "\[+\d\d]", "")
2 likes
Comment
Al Adams

Join Date: Apr 2020

Posts: 54
#5

24 Jul 2020, 22:31

These solutions work well. Thank you so much for your help!
Comment

Announcement

Find and Replace all text within brackets?

Comment

Comment

Comment

Comment