Hello everyone,
I am cleaning a string variable that contains cities. I am using different techniques to clean this variable such as soundex, regexm, substr. At this point, I would like to replace city equals "East Elmhurst" if the first word equals "E" and the 4 last letters of the second word equals "URST". You can see my attempt below. However it doesn't work since my codes look at the entire string instead of a specific word (1st, 2nd, etc). For example my code changes ELMURST for EAST ELMHURST and this is not correct. Is there a way to say to use the regexm expression but only for a specific word in the string or what other codes could be used?
Thank you in advance,
Marvin
I am cleaning a string variable that contains cities. I am using different techniques to clean this variable such as soundex, regexm, substr. At this point, I would like to replace city equals "East Elmhurst" if the first word equals "E" and the 4 last letters of the second word equals "URST". You can see my attempt below. However it doesn't work since my codes look at the entire string instead of a specific word (1st, 2nd, etc). For example my code changes ELMURST for EAST ELMHURST and this is not correct. Is there a way to say to use the regexm expression but only for a specific word in the string or what other codes could be used?
Code:
replace city="EAST ELMHURST" if regexm(city, "E") & substr((city),-4,.)=="URST" replace city="EAST ELMHURST" if regexm(city, "EAS") & regexm(city, "EL")
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str22 city "EAST ELMHURST NEW YORK" "EAST ELMHORST" "EST ELMURST" "EAST ELMHURST" "EAST NORWICK" "EAST MARION" "EAST MEADOW" "EAST EMHURST" "EAST HEMPSTEAD" "EAST NORTHPORT" "EAST NEW YORK" "EAST HAMPTON" "EASTON" "EAST NY" "ELMURST" "ELMHURST" "E ELMHURST" "E. ELMHURST" "GLENHURST" end
Marvin
Comment