Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Remove numbers from string

    Hello,

    I'm trying to remove numbers from the end of string values in a variable. Some but not all of the observations have numbers. The data looks like this:

    BANK OF AMERICA 0547
    CITIGROUP
    R W BAIRD 356
    CREDIT SUISSE 03487
    BARCLAYS

    The numbers are always separated from the end of the string title by a space, but they are different lengths.

    I think I need to use the regexm command but am not clear on how it works and any help would be much appreciated!

    Thank you so much.

    All the best,

    Anna

  • #2
    Hello Anna,

    Welcome to the Stata Forum / Statalist.

    You may wish to take a look at this text.
    Best regards,

    Marcos

    Comment


    • #3
      Several ways to do this. Regular expressions are clearly one way. Here is another:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str20 var1
      "BANK OF AMERICA 0547"
      "CITIGROUP"           
      "R W BAIRD 356"       
      "CREDIT SUISSE 03487" 
      "BARCLAYS"            
      end
      
      replace var1 = subinstr(var1, word(var1, -1), "", 1) if real(word(var1, -1))  < .
      
      list 
      
           +------------------+
           |             var1 |
           |------------------|
        1. | BANK OF AMERICA  |
        2. |        CITIGROUP |
        3. |       R W BAIRD  |
        4. |   CREDIT SUISSE  |
        5. |         BARCLAYS |
           +------------------+
      You will probably want to trim the spaces too.

      Comment


      • #4
        Thank you very much for your help!

        Comment


        • #5
          I have a similar problem with a little variation. I just want to remove any number in my CITY string variable. I can remove the number one at the time with the following code:
          Code:
          replace CITY2=subinstr(CITY2, "1", "", .)
          . Is there a way to remove any number from strings at once. Or perhaps there is an opposite command for destring?


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str20 var1
          "80-45 WINCHESTER BLV"
          "8-Feb-91"            
          "92 PALADINO AVE"     
          "97 ST"               
          "9F"                  
          "B2"                  
          "B410\BKLYN"          
          "B5RONX"              
          "BK3"                 
          "BKL6YN"              
          end

          Comment


          • #6
            Here's two ways of doing that:

            Code:
            *One
            forval n = 0/9{
                replace var1 = subinstr(var1, "`n'", "",.)
            }
            
            *Two
            //if you don't have -egenmore- and -ereplace- installed, run:
            ssc install egenmore
            ssc install ereplace
            
            ereplace var1 = sieve(var1), omit(0123456789)
            The resulting variable is a bit odd; you may want to check out the other options of ereplace, sieve() to remove other characters, etc.

            Comment


            • #7
              Naturally there is an opposite command to destring. It's documented as tostring. But you already have a string variable, so that's a solution to a different problem.

              But what does remove mean here: does it means select or does it mean delete? I guess the latter.

              To delete any number from a string, and setting aside decimal separators whether as in 12.3 or , as in 12,3, then you can loop over the possible numeric characters and replace them with an empty string;

              Code:
              quietly forval j = 0/9 {
                   replace mystr = subinstr(mystr, "`j'", "", .)
              }
              PS: Crossed with helpful answer by Chris Larkin. The egen function he mentions is just a wrapper for a similar loop. I can't see that it has any advantages in 2018. If you're curious, the code explains.

              Code:
              . ssc type _gsieve.ado
              *! 1.0.0 NJC 23 Sept 2002 
              program define _gsieve
              Last edited by Nick Cox; 19 Jul 2018, 10:07.

              Comment


              • #8
                Marvin, also check if you really want to delete the number in the title of the city. There is a bunch of them which are known under a name with a number, (though it is unlikely you get any data for their residents):
                Code:
                Арзамас-16 (Арзамас-75, Горький-130, Москва-300, Шатки-11)      [Саров]
                Загорск-6
                Загорск-7
                Златоуст-36     [Трехгорный]
                Красноярск-26
                Пенза-19        [Заречный]
                Сальск-7
                Свердловск-44   [Новоуральск]
                Свердловск-45   [Лесной]
                Томск-7         [Северск]
                Челябинск-40
                Челябинск-65    [Озерск]
                Челябинск-70    [Снежинск]
                ...

                Comment


                • #9
                  Thank you all for your helpful insights.

                  The loop worked perfectly! So the first line of code just find any number and the second just remove it! Great!

                  Have a great weekend!

                  Comment


                  • #10
                    Hello again,

                    Is there a way to replace special characters in a more efficient way than the code below. I want to remove the "=", "|", ":". Does forval work here?

                    replace CITY2=subinstr(CITY2, "=", "", .)
                    replace CITY2=subinstr(CITY2, "|", "", .)
                    replace CITY2=subinstr(CITY2, ":", "", .)

                    Thanks in advance,
                    Marvin

                    Comment


                    • #11
                      You need -foreach-:

                      Code:
                      foreach i in = | : { 
                          replace CITY2=subinstr(CITY2, "`i'", "", .) 
                          }
                      Stata/MP 14.1 (64-bit x86-64)
                      Revision 19 May 2016
                      Win 8.1

                      Comment


                      • #12
                        Thank you Carole J. Wilson

                        Comment


                        • #13
                          Hi! ustrregexra can help you to delete all numbers as follows:

                          replace var = ustrregexra(var, "([0-9])", "")

                          Comment


                          • #14
                            Hello,

                            If we want to delete both the numbers and decimal separators simultaneously, how can we do that?

                            Comment

                            Working...
                            X