Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • remove * from string

    Hello
    I have a variable Value, which is the value of some assets. The data is imported from excel and is in string format. For some values, there is a * at the end, I wonder how I can remove this * while preserving 2 decimal places for each value. I tried substr, but the length of the string is different in different cases, so that doesn't work.

    Here are some example data
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str17 Value
    "82,291.39*"       
    "39570.6"          
    "24074.64"         
    "15,845.35*"       
    "15,774.22*"       
    "11,649.53*"       
    "11246.64"         
    "11,028.59*"       
    "11000"            
    "9915.68"          
    "8518.709999999999"
    "7997.74"          
    "7896"             
    "7,734.59*"        
    "7650"             
    "7637.72"          
    "7255.6"           
    "6,681.68*"        
    "6218.47"          
    "6,199.11*"        
    "6064.65"          
    "6060"
    end
    Any idea how to solve this problem? Thanks a lot for any help

  • #2
    try the following (I assume here that dp has been set to comma):
    Code:
    gen leng=length(Value)
    gen str newvar=cond(real(substr(Value,-1,1))==.,substr(Value,1,leng-1),Value)
    drop leng
    if you want to make Value numeric, you can now -destring- it

    Comment


    • #3
      Hi Rich, your code works very well in removing *. However, in making newvar numeric, I tried -destring newvar, replace-, but stata says newvar: contains nonnumeric characters; no replace. Can you spot anything wrong? I think it may be that some values have "," for thousandth while some don't, so this cause some problem?

      Comment


      • #4
        yes, that is the issue so try
        Code:
        replace newvar=subinstr(newvar,",","",.)
        and then -destring- will work; see
        Code:
        help string functions
        note that if making the variable was numeric was your goal, there are one-step solutions, including:
        Code:
        destring Value, gen(newvar) ignore(",*")
        Last edited by Rich Goldstein; 24 Sep 2022, 11:42.

        Comment


        • #5
          If you're using Stata 14 or greater you can also use Unicode Regular Expressions, like so:
          Code:
          replace Value = ustrregexra(Value, "[\,\*]", "", .)
          If you're using version 9 or greater, you can do it in two lines with Regular Expressions:
          Code:
          replace Value = regexr(Value, "[\,]", "")
          replace Value = regexr(Value, "[\*]", "")

          Comment


          • #6
            Hi Rich and Matthew, thank you both for your helpful suggestions!

            Comment


            • #7
              Hi, how can i generate a code that will flag 5 consecutive No 'say its coded 0' in my database and make all the other values 9999 after it detects the 5 consecutive No.

              Comment


              • #8
                Originally posted by Matthew Hall View Post
                If you're using Stata 14 or greater you can also use Unicode Regular Expressions, like so:
                Code:
                replace Value = ustrregexra(Value, "[\,\*]", "", .)
                If you're using version 9 or greater, you can do it in two lines with Regular Expressions:
                Code:
                replace Value = regexr(Value, "[\,]", "")
                replace Value = regexr(Value, "[\*]", "")
                What purpose does that last dot in the code serve, Matthew Hall?

                Comment


                • #9
                  https://www.stata.com/manuals/fnstringfunctions.pdf claims it makes the function case-insensitive.

                  Comment


                  • #10
                    Case-insensitive matching can also be obtained setting a flag:
                    Code:
                     assert ustrregexra("aBbBbA", "(?i:b)", "") == ustrregexra("aBbBbA", "b", "",.)

                    Comment


                    • #11
                      Originally posted by Girish Venkataraman View Post

                      What purpose does that last dot in the code serve, Matthew Hall?
                      In this case it's an unnecessary flag for case-insensitivity. But come to think of it, this is still just one line in non-Unicode Regular Expressions (versions 9 and above):
                      Code:
                      replace Value =  regexr(regexr(Value, "[\,]", ""), "[\*]", "")

                      Comment

                      Working...
                      X