Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing trailing dashes in the string

    Hello!

    I am extracting zip codes from the string variable called adr by using the following command:
    Code:
    generate zip = substr(adr,-5,.)
    However, some of the strings have " -" in the end:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str369 adr
    "Mountain View Grand Resort & Spa 101 Mountain View Rd, Whitefield, NH 03598"    
    "Weathervane Theatre 389 Lancaster Rd, Whitefield, NH 03598"                     
    "Mountain View Golf Course 101 Mountain View Rd, Whitefield, NH 03598"           
    "Mountain View Grand Resort & Spa 101 Mountain View Rd, Whitefield, NH 03598"    
    "Silver Star Spa 6 W 48th St, New York, NY 10036 -"                              
    "Red Robin Gourmet Burgers and Brews 18410 33rd Ave W, Lynnwood, WA 98036 -"     
    "Guy's American Kitchen & Bar 220 W 44th St, New York, NY 10036 -"               
    "Firestone Complete Auto Care 360 W St Georges Ave, Linden, NJ 07036 -"          
    "Oasis Massage Salon 9889 Bellaire Blvd #331, Houston, TX 77036 -"               
    "Northeast Pediatric Dentistry 11223 Davinci Dr, Davidson, NC 28036 -"           
    "Righteous Movers 10333 Harwin Dr, Houston, TX 77036 -"                          
    "Consulado Dominicano en Nueva York 1501 Broadway Floor 4r, New York, NY 10036 -"
    end
    How could I possibly drop these dashes and spaces?

    Thankfully,
    Anton

  • #2
    Maybe something like the following.
    Code:
    generate str zip = strtrim(subinstr(substr(adr, -5, .), "-", "", .))
    ETA: Sorry, I didn't notice that not all observations have the extra characters. You could modify the code above with something like this.
    Code:
    generate str zip = substr(strreverse(strtrim(subinstr(strreverse(adr), "-", "", .))), -5, .)
    Or maybe use Stata's regular expression functions.
    Last edited by Joseph Coveney; 13 Dec 2019, 01:04.

    Comment


    • #3
      This is most easliy handled with Stata's regular expression string functions, if you're comfortable with regular expressions. This will remove a trailing dash, as well as any blanks that precede or follow it.
      Code:
      replace adr = ustrregexrf(adr," *- *$","")
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str369 adr
      "Mountain View Grand Resort & Spa 101 Mountain View Rd, Whitefield, NH 03598"  
      "Weathervane Theatre 389 Lancaster Rd, Whitefield, NH 03598"                  
      "Mountain View Golf Course 101 Mountain View Rd, Whitefield, NH 03598"        
      "Mountain View Grand Resort & Spa 101 Mountain View Rd, Whitefield, NH 03598"  
      "Silver Star Spa 6 W 48th St, New York, NY 10036"                              
      "Red Robin Gourmet Burgers and Brews 18410 33rd Ave W, Lynnwood, WA 98036"    
      "Guy's American Kitchen & Bar 220 W 44th St, New York, NY 10036"              
      "Firestone Complete Auto Care 360 W St Georges Ave, Linden, NJ 07036"          
      "Oasis Massage Salon 9889 Bellaire Blvd #331, Houston, TX 77036"              
      "Northeast Pediatric Dentistry 11223 Davinci Dr, Davidson, NC 28036"          
      "Righteous Movers 10333 Harwin Dr, Houston, TX 77036"                          
      "Consulado Dominicano en Nueva York 1501 Broadway Floor 4r, New York, NY 10036"
      end
      To the best of my knowledge, only in the Statlist post linked here is it documented that Stata's new regular expression parser is the ICU regular expression engine documented at http://userguide.icu-project.org/strings/regexp.

      Comment


      • #4
        Joseph Coveney , William Lisowski thank you very much for your solutions. Issue solved.

        Comment


        • #5
          Thanks for the closure.

          I'm not sure how I ended up with that overmuch line of code above.
          Code:
          generate str zip = substr(strtrim(subinstr(adr, "-", "", .)), -5, .)

          Comment

          Working...
          X