Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get rid of unwanted characters based on a varying pattern?

    If I have the following string variable with these entries:

    median_age_____
    median_age_group1__
    median_age_group23___
    median_age_group23456
    median_age_group234__
    ...

    What is the best way to get the following, given that at the end of each string "_" is not consistent in terms of the number of characters it takes up?

    median_age
    median_age_group1
    median_age_group23
    median_age_group23456
    median_age_group234
    ...

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str21 var1
    "median_age_____"      
    "median_age_group1__"  
    "median_age_group23___"
    "median_age_group23456"
    "median_age_group234__"
    end
    
    gen wanted = ustrregexra(var1,"([^[a-zA-Z0-9]]+$)", "")
    Res.:

    Code:
    . l
    
         +-----------------------------------------------+
         |                  var1                  wanted |
         |-----------------------------------------------|
      1. |       median_age_____              median_age |
      2. |   median_age_group1__       median_age_group1 |
      3. | median_age_group23___      median_age_group23 |
      4. | median_age_group23456   median_age_group23456 |
      5. | median_age_group234__     median_age_group234 |
         +-----------------------------------------------+

    Comment


    • #3
      Code:
      gen var2 = ustrregexrf(var1, "_+$", "")

      Comment

      Working...
      X