Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop special characters from a string variable

    Hi

    I have a string variable that contains name of songs. I would like to
    drop all special characters and keep only letters and numbers. I think
    I could use the regex command, but I'm not sure how to specify it.
    Probably I should include:

    "[^a-zA-Z0-9']",""

    at some point, but could anybody help me on how to write the command?

    Thank you very much

  • #2
    regex refers to functions, not commands. There is an egen function in egenmore (SSC) that does this, but it is better to think from first principles. Look at each character and copy it if it's what you want. This code omits all punctuation and spaces too.

    Code:
     
    gen newname = "" 
    gen length = length(songname) 
    su length, meanonly 
    
    forval i = 1/`r(max)' { 
         local char substr(songname, `i', 1) 
         local OK inrange(`char', "a", "z") | inrange(`char', "A", "Z") | inrange(`char', "0", "9") 
         replace newname = newname + `char' if `OK' 
    }
    Don't rewrite the local definitions with equals signs.




    Comment


    • #3
      Estrella,

      I would also suggest that you have a look at the cleanchars module written by Lars Angquist and available via SSC. On similar lines you may explore functionalities offered by the strip module by P.T. Seed and sproper by Austin Nichols. All modules are concerned with addressing the problem of special / unwantedcharacters in the data.
      Kind regards,
      Konrad
      Version: Stata/IC 13.1

      Comment


      • #4
        Very useful. Thank you!

        Comment

        Working...
        X