Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping observations based on the contents of a string variable

    Hello,

    I am working on data that contains both US zip codes and Canadian zip codes that start with letters (truncated to their first five characters). I want to modify my data so that only US zip codes remain, while the foreign ones are dropped. Here is an example:
    Observation zip_code
    1 "12345"
    2 "58797"
    3 "L6J 9"
    4 "K9R 6"
    I tried to run this code, however when it ran it dropped all observations. Is there an efficient way to do it without having to drop each observation manually? I must mention that all the zip codes are strings so that I don't lose zip codes that have leading 0s. Thanks!

    Code:
    drop if zip_code[0] != "0" & zip_code[0] != 1 & [...] & zip_code[0] != 9

  • #2
    Standard US ZIP codes range from "00001" to "99950", so you can look for strings that are of length 5 and lie within this range. This, however, may not be exhaustive to eliminate foreign ZIP codes.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte observation str7 zip_code
    1 `""12345""'
    2 `""58797""'
    3 `""L6J 9""'
    4 `""K9R 6""'
    end
    
    replace zip= trim(ustrregexra(zip, "[^0-9a-zA-Z]", ""))
    drop if length(zip)!=5 & !inrange(real(zip), 1, 99950)
    Res.:

    Code:
    . l
    
         +---------------------+
         | observ~n   zip_code |
         |---------------------|
      1. |        1      12345 |
      2. |        2      58797 |
         +---------------------+
    
    .

    Comment


    • #3
      Thank you

      Comment

      Working...
      X