Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find and Replace all text within brackets?

    Hi, I am working with a panel dataset whose variable names are very different for every wave of the survey. In order to give the variables some meaningful names, I must rename each individual variable. To do this, I typically copy and paste the list of variables from the dataset's website into the do-file, where I then use a loop function to rename them. However, when I paste them, they appear like this

    Code:
    [[68]V181 [69]V801 [70]V1490 [71]V2202 [72]V2828 [73]V3300 [74]V3720 [75]V4204 [76]V5096 [77]V5662 [78]V6209 [79]V6802 [80]V7447 [81]V8099 [82]V8723 [83]V9408 [84]V11055 [85]V11938 [86]V13565 [87]V14612 [88]V16086 [89]V17483 [90]V18814 [91]V20114 [92]V21420 [93]V23276 [94]ER3944 [95]ER6814 [96]ER9060 [97]ER11848 [99]ER15928 [01]ER19989 [03]ER23426 [05]ER27393 [07]ER40565 [09]ER46543 [11]ER51904 [13]ER57659 [15]ER64810 [17]ER70882
    The variable above is the variable for a person's income. The number in the bracket is the year corresponding to the measure. For example, V181 is a person's income in 1968. V801 is a person's income in 1969, etc.

    I'd like to delete the brackets and everything in them, so that the pasted variable names then look like this:

    Code:
    V181  V801  V1490  V2202  V2828  V3300  V3720  V4204  V5096  V5662  V6209  V6802  V7447  V8099  V8723  V9408  V11055  V11938  V13565  V14612  V16086  V17483  V18814  V20114  V21420  V23276  ER3944  ER6814  ER9060  ER11848  ER15928  ER19989  ER23426  ER27393  ER40565  ER46543  ER51904  ER57659  ER64810  ER70882
    Is there an easier way to do this besides deleting the brackets and their contents one by one?


  • #2
    Based on your description, I suspect there may be a better way for you to extract the dataset you're working with.
    However, it seems like you question is about how to remove strings of the pattern "[\d\d]", rather than what you describe in the title of the thread. You can do this a number of ways, see help string functions for details on these. I'll illustrate a simple one below.

    Code:
    // put the long string in a local
    local longstring " [68]V181 [69]V801 [70]V1490 [71]V2202 [72]V2828 [73]V3300 [74]V3720 [75]V4204 [76]V5096 [77]V5662 [78]V6209 [79]V6802 [80]V7447 [81]V8099 [82]V8723 [83]V9408 [84]V11055 [85]V11938 [86]V13565 [87]V14612 [88]V16086 [89]V17483 [90]V18814 [91]V20114 [92]V21420 [93]V23276 [94]ER3944 [95]ER6814 [96]ER9060 [97]ER11848 [99]ER15928 [01]ER19989 [03]ER23426 [05]ER27393 [07]ER40565 [09]ER46543 [11]ER51904 [13]ER57659 [15]ER64810 [17]ER70882 "
    
    // loop over the longstrings
    foreach shortstring of local longstring {
        display "The raw string is `shortstring'"
        local cleanshortstring = substr("`shortstring'", 5,.) // you can use another string command here to meet your needs
        display "The clean string is `cleanshortstring'" // now you can do whatever you like with the clean string
    }

    Comment


    • #3
      Code:
      local varnames "[[68]V181 [69]V801 [70]V1490 [71]V2202 [72]V2828 [73]V3300 [74]V3720 [75]V4204 [76]V5096 [77]V5662 [78]V6209 [79]V6802 [80]V7447 [81]V8099 [82]V8723 [83]V9408 [84]V11055 [85]V11938 [86]V13565 [87]V14612 [88]V16086 [89]V17483 [90]V18814 [91]V20114 [92]V21420 [93]V23276 [94]ER3944 [95]ER6814 [96]ER9060 [97]ER11848 [99]ER15928 [01]ER19989 [03]ER23426 [05]ER27393 [07]ER40565 [09]ER46543 [11]ER51904 [13]ER57659 [15]ER64810 [17]ER70882"
      local varnames= ustrregexra("`varnames'", "(\[[\[0-9]+\])", "",.)
      di "`varnames'"
      Res.:

      Code:
      . 
      . di "`varnames'"
      V181 V801 V1490 V2202 V2828 V3300 V3720 V4204 V5096 V5662 V6209 V6802 V7447 V8099 V8723 V9408 V11055 V11938 V13565 V14612 V16086 V17483 V18814 V20114 V21420 V23276 ER3944 E
      > R6814 ER9060 ER11848 ER15928 ER19989 ER23426 ER27393 ER40565 ER46543 ER51904 ER57659 ER64810 ER70882

      Comment


      • #4
        in #3
        Code:
        ustrregexra("`varnames'", "(\[[\[0-9]+\])", "",.)
        The last argument (setting case-insensitive matching) is not needed (and will probably be less efficient). A shorter, and faster, alternative is
        Code:
        ustrregexra("`varnames'", "\[+\d\d]", "")

        Comment


        • #5
          These solutions work well. Thank you so much for your help!

          Comment

          Working...
          X