Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Destring returns "Option / not allowed "

    I'm trying to destring a variable and have used the code below.

    Code:
    destring numberid, replace ignore(A B C D E F G H I J K L M N O P Q R S T U V W X Y Z , / \ - . _ : ; + = `"""' * ( ) $ & ? ! | ' ) force
    Stata returns "Option / not allowed"

    I've tried a different order of the characters and tried with quotes around them, no success. Just to add that this code worked fine when run in Stata 13.1 but I'm currently using Stata 15.1 and it has not worked. Could anyone help please explain where I'm going wrong?

    Thank you

  • #2
    As shown in the help file, you need to enclose what you want ignored in double quotation marks - but you do not need to leave spaces; but I see that you are including all letters (capitalized) and a sizable number of punctuation marks also - why? if you need to see what character(s) are causing you problems, try:
    Code:
    tab numberid if real(numberid)==.

    Comment


    • #3
      I've tried with double quotation marks and still no success.

      Code:
       destring numberid, replace ignore("ABCDEFGHIJKLMNOPQRSTUVWXYZ,/\-._:;+=`""'*()$&?!|") force
      Stata returned either too few quotes or unmatched quotes

      What am I doing wrong?

      Comment


      • #4
        If you have grave accents in the data, you will have an issue with including this within -ignore()-. See https://www.statalist.org/forums/for...es-error/page2 for a related thread. An alternative is to use regular expressions

        Code:
        gen wanted = real(ustrregexra(numberid, "[^\d]+", ""))

        Comment


        • #5
          For others who read this and don't look closely at the list of characters to ignore in post #1, note that they include "-" and "." which are meaningful if your numeric data could include negative values or decimal fractions. Andrew's solution in post #4 - which implements what was requested in post #1 - can be modified easily to include – along with the digits 0-9 represented by "\d" – the minus sign and decimal point.
          Code:
          . generate t1 = ustrregexra(s, "[^\d]+", "")
          
          . generate t2 = ustrregexra(s, "[^\d-.]+", "")
          
          . list, clean
          
                         s     t1     t2  
            1.   1 or more      1      1  
            2.         1.1     11    1.1  
            3.       1,234   1234   1234  
            4.         -42     42    -42  
          
          .

          Comment


          • #6
            Although it's long since that StataCorp took over destring, examples like this make me wonder about the original variable, as this kind of application was not behind my writing destring in the first place. I'd suggest backing up and showing us a data example.

            The idea that extracting numeric characters with a regular expression is a better idea could be bang on target, or a data example might make a more direct solution evident.

            Comment

            Working...
            X