Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extract a portion of a string variable

    Hello. I need to generate a variable that extracts from a text string a number that can be of any extension and can be anywhere in the text. For example, the value "CALLE JUAN PEREZ 1130 SANTIAGO" will result in the value "1130", and the value "GONZALEZ 1130231 RANCAGUA" will result in the value "1130231". I have reviewed similar questions on this forum and so far I have not found one to help me. Thanks

  • #2
    Code:
    destring X, ignore(ABCDEFGHIJKLMNOPQRSTUVWXYZ) force gen(Xnum)

    Comment


    • #3
      An ingenious approach that will help me in most cases. Thank you very much

      Comment


      • #4
        A somewhat more automated solution that doesn't require a list of characters to ignore, and points out problems (in either approach) with any string that contains more than one number.
        Code:
        . generate Xnum = ustrregexra(X,"\D","")
        
        . list, clean noobs
        
                                          X      Xnum  
            CALLE JUAN PEREZ 1130 SANTIAGO       1130  
                 GONZALEZ 1130231 RANCAGUA    1130231  
                  Difficult #42 Unit 56-58     425658  
        
        .

        Comment


        • #5
          Thanks William for the improvement. I figured there was a better way. I don't think there is a perfect solution as your example shows. Going to be a little noodling on this one (i.e., "in most cases."), though now much less (maybe). If those numbers are like zip codes (or some such), might be able to restrict in some way.

          Comment


          • #6
            Here's another version that works a little differently.
            Code:
             generate str40 Xnum = ustrregexra(X,"\D+"," ")
            
            . replace Xnum = trim(Xnum)
            (4 real changes made)
            
            . split Xnum, destring
            variables born as string: 
            Xnum1  Xnum2  Xnum3
            Xnum1: all characters numeric; replaced as long
            (1 missing value generated)
            Xnum2: all characters numeric; replaced as byte
            (3 missing values generated)
            Xnum3: all characters numeric; replaced as byte
            (3 missing values generated)
            
            . list, clean noobs
            
                                             X       Xnum     Xnum1   Xnum2   Xnum3  
                CALLE JUAN PEREZ 1130 SANTIAGO       1130      1130       .       .  
                     GONZALEZ 1130231 RANCAGUA    1130231   1130231       .       .  
                      Difficult #42 Unit 56-58   42 56 58        42      56      58  
                                       Nothing                    .       .       .

            Comment


            • #7
              Nice

              Comment


              • #8
                Thanks to both!

                Comment

                Working...
                X