Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count distinct digits of observations

    Dear statalists,

    I would like to count the number of digits an observation has as follows:
    Number Distinct
    00000000000 1
    000001111444 3
    etc. etc.
    does somebody have an idea which function I can use?

    Thank you!

    Felix

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 number byte distinct
    "00000000000"  1
    "000001111444" 3
    end
    
    forval i=0/9{
    g c`i' = real(regexs(0)) if(regexm(number, "[`i']"))
    }
    
    egen count= rownonmiss(c*)
    drop c0- c9


    Result:

    Code:
    . l
    
         +---------------------------------+
         |       number   distinct   count |
         |---------------------------------|
      1. |  00000000000          1       1 |
      2. | 000001111444          3       3 |
         +---------------------------------+

    Comment


    • #3
      Thank you!

      Comment


      • #4
        Another way to do it:

        Code:
        gen wanted = 0
        
        quietly forval j = 0/9 {
            replace wanted = wanted + (strpos(number, "`j'") > 0)
        }

        Comment


        • #5
          Yes! Andrew Musau, would you mind explaining what
          real(regexs(0))
          and
          if(regexm(number, "[`i']"))
          do?

          Comment


          • #6
            All documented at

            Code:
            help real()
            help regexm
            The real function turns number strings to real numbers and non-number strings to missing, e.g.,

            Code:
            . display real("777")
            777
            
            . display real("123 Close")
                .
            For regexm() and regexs()

            regexm(s,re)
            Description: performs a match of a regular expression and evaluates to 1 if regular expression re is satisfied by the ASCII string s; otherwise,
            0

            regexs(n)
            Description: subexpression n from a previous regexm() match, where 0 < n < 10

            Subexpression 0 is reserved for the entire string that satisfied the regular expression.
            So, I am consecutively searching for a particular number (one at a time using foreach i=0/9) in the string variable "number" and asking Stata to create a variable equal to that number if a match is found. The real function is needed because I use egen at the end to count, and egen requires input numeric.

            Comment

            Working...
            X