Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Validity check for South African identity numbers using the Luhn algorithm

    Two files (one ado; one do) attached, which checks a variable containing South African id numbers for validity using the Luhn algorithm.
    The actual implementation of the alogrithm is my own (and very parsimonious!) - but the surrounding code is built off the original suggestion provided here (https://www.stata.com/statalist/arch.../msg00300.html) by Nick Cox and modified as the pnrcheck.ado by N Orsini.
    Attached Files

  • #2
    Here are some suggestions on how I would change that program. The main change is that I would let the user choose the name of the valid_id variable. Maybe someone using your command already did some preprocessing of the ids and stored them in valid_id and than uses your program to check the result. If your program automatically overwrites the variable named valid_id, then that would be bad. I also made sure that the id variable you specified remains unchanged, the logic being that a variable that checks should only check. It is Stata convention that programs don't change your data unless you specifically tell it to.


    Code:
    **ado file to check validity of South African ID numbers
    **using the Luhn Algorithm. Implementation author's own,
    **but approach derived and modified from the _pnrcheck_ ado
    **originally suggested by Nick Cox and further modified
    **by N. Orsini.
    
    
    **Syntax: checkZAid <idvariable> [if] [in] , valid_id(<varname> [, replace])
    
    **Code adds leading zeroes to numeric ID numbers, and
    **converts numeric ID numbers to string format to test
    **validity. ID numbers with alpha characters are marked
    **as invalid. Extraneous spaces are removed
    
    **Tom Moultrie, University of Cape Town, 2022
    **[email protected]
    
    
    
    capture program drop checkZAid
    program checkZAid
        version 15
     
        syntax varname [if] [in], valid_id(passthru)
        marksample touse, strok novarlist
    
        quietly {
            // prepare the variable that needs to be checked
            //   make it a string variable when necessary
            //   add leading zeros when necessary
            //   make missing when it contains non-numeric characters
            tempname tocheck
            capture confirm numberic variable `varlist'
            
            if _rc { // string variable
                gen `tocheck' = subinstr(`varlist'," ","",.)        
                   replace `new_id' =string(real(`varlist'),"%013.0f")
            }
            else { // numeric variable
                format `varlist' %013.0f
                tostring `varlist'  , generate(`tocheck')  format(%13.0s)
                replace `tocheck' = subinstr(`varlist'," ","",.)        
            }
     
    
            // parse valid_id
            Parse_valid_id(`valid_id')
            local valid_id = s(valid_id)
     
            // main routine to implement the Luhn algorithm below
            tempvar cnt id
            gen `cnt' =0
            gen byte `id' = .
            forvalues j=1/13 {
                replace `id' =real(substr(`new_id',`j',1))
                if mod(`j',2)==0 {
                    replace `id'= mod(`id'*2,10)+int(`id'*2/10)
                }
                replace `cnt'=`cnt'+`id'    
            }
            
            gen `valid_id' = mod(`cnt',10)==0
            la def valid_id 0"Error" 1"Valid"
            la val `valid_id' valid_id
        }
        
        tab `valid_id'
    end
    
    program define Parse_valid_id, sclass
        syntax name(name=valid_id) [, replace]
        
        if "`replace'" != "" {
            capture drop `valid_id'
        }
        confirm new variable `valid_id'
        
        sreturn local valid_id "`valid_id'"
    end
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks SO much, Maarten. Implementing. On a protocol level, is the second 'program' necessary?

      Is there anything 'wrong' with setting
      syntax varname [if] [in],VALidid(string) [REPlace]
      and then at the point of generating the user-specified output variable:
      if "`replace'"!="" {
      capture drop `validid'
      capture label drop `validid'
      }

      - which works.

      Thanks again for your help!
      Last edited by Tom Moultrie; 21 Jul 2022, 05:46.

      Comment


      • #4
        It is a slightly different syntax. I hat in mind that the user would write something like checkZAid id, validid(foo, replace). I thought about it that way as it may make it more clear that the replace refers to the variable foo. In your case the user would write checkZAid id, validid(foo) replace. This would be fine to. Regardless, I would use VALidid(name) instead of VALidid(string) in the syntax command, as that way the syntax command will already check for you whether the argument is a valid variable name.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          Beware of typos when using the capture command. From #2

          Code:
          capture confirm numberic variable `varlist'
                  
          if _rc { // string variable
              ...
          }
          will treat every variable as a string variable. Better syntax would be

          Code:
          capture confirm numeric variable `varlist'
          
          if (_rc == 0) { // numeric variable
              ...
          }
          else if (_rc == 7) { // string variable
              ...
          }
          else { // unexpected error
              display as err "an unexpected error occurred"
              error _rc
          }
          For simple problems, like the example here, it is usually not necessary to capture unexpected errors. Moreover, the problem above would show up in the most basic certification script, which I would recommend writing whenever you write a program, especially a program that you plan to share with others.

          Comment


          • #6
            daniel klein good catch

            Additionally, I have been overthinking the syntax. The standard Stata syntax for when a variable is to create a new variable is to use an option called generate(). You use that, then the replace option makes immediate sense, and there is no need to nest that within an option

            So the syntax line should be syntax varname [if] [in], GENerate(name) [replace]

            Also note that you haven't used the if and in qualifiers. To do so replace the line gen `generate' = mod(`cnt',10)==0 with gen `generate' = mod(`cnt',10)==0 if `touse'
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Thanks all! I think that that's pretty definitively solved! [I will submit some testbed data; and a help file; and a certification script in due course to SSC]. In the meantime, here is the cleaned up ado file with testbed data showing handling of a number of potential cases: erroneous spaces in the id; text in the id; insertion of leading zeroes into the id; and flagging of impossible ID numbers that pass the Luhn algorithm but nonetheless fail the determination of a South African ID number, the first 6 elements of which are YYMMDD of birth.

              Attached Files

              Comment

              Working...
              X