Hi all,
I have 40 string variables in a dataset with 4 million cases. The str variables have many characters that I do not want that therefore I wish to replace them with "". I am currently doing it via a loop. This is OK but takes about 30 minutes to run.
I was not able to find a better solution on the forum. I found this and this, but they are similar to my code, a one at a time iteration through the characters.
Anyone have any ideas on how one might do a wholesale substitution? E.g., if any one of the following characters are present, replace them with a "". (I realize that is pseudo-code but hopefully you understand what I am hoping to find.)
Here is my code using a sample dataset:
Thanks for whatever advice you can offer.
Ben
I have 40 string variables in a dataset with 4 million cases. The str variables have many characters that I do not want that therefore I wish to replace them with "". I am currently doing it via a loop. This is OK but takes about 30 minutes to run.
I was not able to find a better solution on the forum. I found this and this, but they are similar to my code, a one at a time iteration through the characters.
Anyone have any ideas on how one might do a wholesale substitution? E.g., if any one of the following characters are present, replace them with a "". (I realize that is pseudo-code but hopefully you understand what I am hoping to find.)
Here is my code using a sample dataset:
Code:
clear all input str30 start str10 finish "! he%L(L&^o:: {[ ])" "hello" " T_-h))er!`E< - - &*" "there" " $%m#y f+*INe " "my fine" "Fu,.R??RY_ _ " "furry" "fr$%#@ei``n--__DS<> ><" "friends" end * this was my first version to be clear what each step is doing foreach var of varlist start { g `var'_1=strtrim(lower(`var')) g `var'_2=`var'_1 foreach char in "#" "$" "%" "&" "'" "(" ")" "*" "+" "," "-" "." "/" ":" ";" "=" "?" "@" "[" "\" "_" "`" "{" "}" "~" "]" "! " "!" "~" "<" ">" "^" { replace `var'_2=subinstr(`var'_2,"`char'","",.) } g `var'_3=stritrim(`var'_2) g `var'_4=strtrim(`var'_3) } * this is my final version except I have 30 variables following the foreach statement. foreach var of varlist start { g `var'_clean=lower(`var') foreach char in "#" "$" "%" "&" "'" "(" ")" "*" "+" "," "-" "." "/" ":" ";" "=" "?" "@" "[" "\" "_" "`" "{" "}" "~" "]" "! " "!" "~" "<" ">" "^" { replace `var'_clean=subinstr(`var'_clean,"`char'","",.) } replace `var'_clean=stritrim(`var'_clean) replace `var'_clean=strtrim(`var'_clean) } order start* finish
Ben
Comment