I am working with zip codes (variable "ZIP"), but I have many incomplete zip code values that I need to replace with 99999. This is how I will track the unknown zip codes in my data set.
Most of the incomplete zip codes are simply fewer characters (e.g., 1, 222, 4564), so I am trying to find a way to use the replace function while specifying that any values with fewer than 5 characters (the full length of zip codes) are replaced with 99999.
My original data was saved as string, but I eventually am converting to numeric, which is a step I know how to do. It doesn't matter to me if this data cleaning step happens before or after the conversion to numeric, but it has to be based on character length.
I was previously using the replace command as follows:
but obviously this is a slower way to do this. I don't just need the "0"'s changed to 99999, I need the "1"'s, "2"s, etc... changed as well. I have read all about the length() command and thought that might help me, but haven't been able to figure this out yet.
Thank you for the help.
Most of the incomplete zip codes are simply fewer characters (e.g., 1, 222, 4564), so I am trying to find a way to use the replace function while specifying that any values with fewer than 5 characters (the full length of zip codes) are replaced with 99999.
My original data was saved as string, but I eventually am converting to numeric, which is a step I know how to do. It doesn't matter to me if this data cleaning step happens before or after the conversion to numeric, but it has to be based on character length.
I was previously using the replace command as follows:
replace ZIP="99999" if ZIP=="" | ZIP=="00000" | ZIP=="XXXXX" | ZIP=="0" | ZIP=="00" | ZIP=="000" | ZIP=="0000"
Thank you for the help.
Comment