Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing non-numeric characters from a variable

    Hi Stata Users,

    I am having a string variable that has currency and would like to remain with numeric part as shown below
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str26 original float clean
    "USD 1000"                   1000
    "2500 United States Dollars" 2500
    "INR 50"                       50
    "1876"                       1876
    "470"                         470
    end
    Does anyone has any hints on how I can achieve this?
    Thanks in advance!

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str26 original
    "USD 1000"                  
    "2500 United States Dollars"
    "INR 50"                      
    "1876"                      
    "470"                        
    end
    
    gen clean = real(regexs(1)) if regexm(original, "[ a-zA-Z]*([0-9]*)[ a-zA-Z]*")
    Last edited by Fei Wang; 09 Nov 2021, 09:59.

    Comment


    • #3
      Thanks so much Fei Wang for the brilliance. I sincerely appreciate

      Comment


      • #4
        Code:
        gen n1 = real(ustrregexs(0)) if ustrregexm(original, "\d+")
        
        gen n2 = real(ustrregexra(original, "\D", ""))

        Comment


        • #5
          Thanks Bjarte Aagnes for your proposed approach. Much appreciated.

          Comment


          • #6
            Fei Wang the example I provided was ignored some scenario - valid decimals. Below is an example

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str26 original float(clean n1 n2)
            "USD 1000"                   1000 1000   1000
            "2500 United States Dollars" 2500 2500   2500
            "INR 50"                       50   50     50
            "1876.49"                    1876 1876 187649
            "470.15"                      470  470  47015
            end
            Any ideas on ensuring the decimals are retained?

            Thanks in advance!

            Comment


            • #7
              Stephen Okiya , this should work.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str26 original 
              "USD 1000"                   
              "2500 United States Dollars" 
              "INR 50"                      
              "1876.49"                    
              "470.15"                     
              end
              
              gen clean = real(regexs(1)) if regexm(original, "[ a-zA-Z]*([0-9]*\.?[0-9]*)[ a-zA-Z]*")

              Comment


              • #8
                not tested, you may adapt:
                Code:
                gen N1 = round(real(ustrregexs(0)),1) if ustrregexm(original, "\d+\.{0,1}\d{0,2}")

                Comment


                • #9
                  Fei Wang Thanks a bunch for the code

                  Comment

                  Working...
                  X