Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I extract a part of a string variable which contains zipcode information?

    Hello,
    I'm working with Massachusetts zip code data and there are 3 kinds of formats the data is currently in:
    1) Normal 5 digit zip code data, for example, 02144
    2) 4 digit zip code, where the initial 0 is missing, for example, 2144
    3) Zip code data of the format 02144-1234

    My zip code variable is currently a string variable because of the "-" in the "02144-1234". I am hoping, with some help, to make the following transformations:
    1) Add a 0 to the 4 digit zipcodes so that 2144 becomes 02144
    2) Extract the first 5 digits from the "02144-1234" format zip code so that it becomes 02144.

    I'm making these transformations so that I am able to destring the zip codes once they are all in the 5 digit zip code format (Ex: 02144).

    I would appreciate any help I can get. Here's an extract from my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str10 zip
    "2140"     
    "2148"     
    "02121"     
    "2151"     
    "02121-2936"
    "02119-3302"
    "02140"     
    "02136-3407"
    "02301"     
    "02150-3151"
    "02122-1023"
    "01905-2844"
    "02124-2710"
    "02108-4704"
    "02121-1105"
    "02121-1897"
    "02122-1130"
    "02301-1228"
    "02148-7911"
    "02368-4422"
    "02151-5227"
    "02143"     
    "02125"     
    "01905"     
    "2149"     
    "02148-2704"
    "02368-1701"
    "02127-3733"
    "02150-3865"
    "02151-2230"
    "02125-1826"
    "01760-5711"
    "02145-1414"
    "02119-1261"
    "01905"     
    "01905-1626"
    "01902-3816"
    "02148-5718"
    "02130-2307"
    "02143-4338"
    "02170-3206"
    "02121-1830"
    "02151-2854"
    "02131-2020"
    "02151-4159"
    "02148-7813"
    "02145-1726"
    "02151"     
    "02119-3364"
    "02125-3293"
    "02126-2458"
    "02150-2908"
    "02135"     
    "02148"     
    "02145-1718"
    "02139-3910"
    "02128-1454"
    "02151-2533"
    "02151-3201"
    "02121-2070"
    "02151-5531"
    "02368-1528"
    "02128-2518"
    "01702-6084"
    "02124"     
    "02149-5213"
    "02148-4617"
    "02145-1621"
    "02145-3016"
    "02130-1533"
    "02139-4173"
    "02149-4923"
    "02126-2316"
    "02130-2634"
    "02368-3853"
    "02472"     
    "02148-2847"
    "01902-2843"
    "02136-2114"
    "01902-1829"
    "02151-2618"
    "02150"     
    "02119"     
    "02155-5473"
    "02120-1706"
    "02150-3418"
    "02131-1831"
    "02145-2946"
    "01902"     
    "02150-3518"
    "02148-4112"
    "02128-1337"
    "02125"     
    "02121"     
    "02126-2006"
    "02128-2553"
    "02301-5913"
    "02150"     
    "02150-2908"
    "02124-1304"
    "02151-2948"
    "02150-1429"
    "01902-3984"
    "02151"     
    "02368"     
    "02140-3239"
    "02136-6724"
    "02072-2452"
    "02150-3104"
    "02124-2836"
    "02124-3411"
    "02129-2737"
    "02125"     
    "02128-1056"
    "02136"     
    "02150-3309"
    "02150-3315"
    "02302-2843"
    "02124-1547"
    "02127-2014"
    "02476-6206"
    "02119-1262"
    "02150-1924"
    "02302-3502"
    "01903-2090"
    "02151-4531"
    "02135-4429"
    "02148-1500"
    "02141"     
    "02119-1190"
    "02169-5734"
    "02151-4021"
    "02129-2308"
    "02118-1502"
    "02124-2428"
    "02118-2459"
    "02121-1536"
    "02472-3057"
    "02128-2563"
    "02126-1417"
    "02130-1413"
    "02149-5213"
    "02122"     
    "02131"     
    "02148-2804"
    "02136"     
    "02169-7217"
    "02121-3930"
    "02122-3301"
    "02130"     
    "02151"     
    "02150-3728"
    "02148-7023"
    "02151"     
    "02170-3428"
    "02150-2862"
    "02150-2959"
    "02122-1369"
    "02149-3824"
    "02124-4805"
    "01887"     
    "02119"     
    "02149"     
    "02126"     
    "02145"     
    "02215"     
    "02155"     
    "02126"     
    "02118"     
    "02136"     
    "02119"     
    "01901"     
    "02127"     
    "02135"     
    "01905"     
    "01902"     
    "02118"     
    "02149"     
    "02150"     "     
    "02121"     
    "02119"     
    "2151"     
    "2125"     
    "2130"     
    "2186"     
    "2152"     
    "2151"     
    "2124"     
    "2149"     
    "1905"     
    end
    Thank you in advance!

  • #2
    Try
    Code:
    gen zipcode = zip 
    replace zipcode = "0" + zipcode if length(zipcode) <= 4
    replace zipcode = substr(zipcode,1,5)
    destring zipcode, replace

    Comment


    • #3
      Here's one way to go:
      Code:
      g zipn=substr(zip,1,5)
      replace zipn=cond(length(zipn)==4,"0"+zipn,zipn)
      Added: crossed with #2

      Comment


      • #4
        I hope this helps
        Code:
        replace zip = "0" + zip if length(zip)==4
        replace zip = substr(zip,1,5) if length (zip)> 5


        Last edited by Joseph Mawa; 01 Feb 2019, 13:27.

        Comment


        • #5
          Thank you!

          Comment

          Working...
          X