Validity check for South African identity numbers using the Luhn algorithm

Tom Moultrie

Join Date: Dec 2015

Posts: 33
#1

Validity check for South African identity numbers using the Luhn algorithm

21 Jul 2022, 00:45

Two files (one ado; one do) attached, which checks a variable containing South African id numbers for validity using the Luhn algorithm.
The actual implementation of the alogrithm is my own (and very parsimonious!) - but the surrounding code is built off the original suggestion provided here (https://www.stata.com/statalist/arch.../msg00300.html) by Nick Cox and modified as the pnrcheck.ado by N Orsini.
Attached Files

checkZAid.ado (1.8 KB, 1 view)

checkZAid.do (1.4 KB, 1 view)
Tags: None

Maarten Buis

Join Date: Mar 2014
Posts: 3459

21 Jul 2022, 02:38

Here are some suggestions on how I would change that program. The main change is that I would let the user choose the name of the valid_id variable. Maybe someone using your command already did some preprocessing of the ids and stored them in valid_id and than uses your program to check the result. If your program automatically overwrites the variable named valid_id, then that would be bad. I also made sure that the id variable you specified remains unchanged, the logic being that a variable that checks should only check. It is Stata convention that programs don't change your data unless you specifically tell it to.

Code:

**ado file to check validity of South African ID numbers
**using the Luhn Algorithm. Implementation author's own,
**but approach derived and modified from the _pnrcheck_ ado
**originally suggested by Nick Cox and further modified
**by N. Orsini.


**Syntax: checkZAid <idvariable> [if] [in] , valid_id(<varname> [, replace])

**Code adds leading zeroes to numeric ID numbers, and
**converts numeric ID numbers to string format to test
**validity. ID numbers with alpha characters are marked
**as invalid. Extraneous spaces are removed

**Tom Moultrie, University of Cape Town, 2022
**[email protected]



capture program drop checkZAid
program checkZAid
    version 15
 
    syntax varname [if] [in], valid_id(passthru)
    marksample touse, strok novarlist

    quietly {
        // prepare the variable that needs to be checked
        //   make it a string variable when necessary
        //   add leading zeros when necessary
        //   make missing when it contains non-numeric characters
        tempname tocheck
        capture confirm numberic variable `varlist'
        
        if _rc { // string variable
            gen `tocheck' = subinstr(`varlist'," ","",.)        
               replace `new_id' =string(real(`varlist'),"%013.0f")
        }
        else { // numeric variable
            format `varlist' %013.0f
            tostring `varlist'  , generate(`tocheck')  format(%13.0s)
            replace `tocheck' = subinstr(`varlist'," ","",.)        
        }
 

        // parse valid_id
        Parse_valid_id(`valid_id')
        local valid_id = s(valid_id)
 
        // main routine to implement the Luhn algorithm below
        tempvar cnt id
        gen `cnt' =0
        gen byte `id' = .
        forvalues j=1/13 {
            replace `id' =real(substr(`new_id',`j',1))
            if mod(`j',2)==0 {
                replace `id'= mod(`id'*2,10)+int(`id'*2/10)
            }
            replace `cnt'=`cnt'+`id'    
        }
        
        gen `valid_id' = mod(`cnt',10)==0
        la def valid_id 0"Error" 1"Valid"
        la val `valid_id' valid_id
    }
    
    tab `valid_id'
end

program define Parse_valid_id, sclass
    syntax name(name=valid_id) [, replace]
    
    if "`replace'" != "" {
        capture drop `valid_id'
    }
    confirm new variable `valid_id'
    
    sreturn local valid_id "`valid_id'"
end

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------

Comment

Tom Moultrie

Join Date: Dec 2015

Posts: 33
#3

21 Jul 2022, 04:55

Thanks SO much, Maarten. Implementing. On a protocol level, is the second 'program' necessary?

Is there anything 'wrong' with setting
syntax varname [if] [in],VALidid(string) [REPlace]
and then at the point of generating the user-specified output variable:
if "`replace'"!="" {
capture drop `validid'
capture label drop `validid'
}
- which works.

Thanks again for your help!

Last edited by Tom Moultrie; 21 Jul 2022, 05:46.
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#4

21 Jul 2022, 06:13

It is a slightly different syntax. I hat in mind that the user would write something like checkZAid id, validid(foo, replace). I thought about it that way as it may make it more clear that the replace refers to the variable foo. In your case the user would write checkZAid id, validid(foo) replace. This would be fine to. Regardless, I would use VALidid(name) instead of VALidid(string) in the syntax command, as that way the syntax command will already check for you whether the argument is a valid variable name.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
daniel klein

Join Date: Mar 2014

Posts: 3861
#5

21 Jul 2022, 06:30

Beware of typos when using the capture command. From #2

Code:

capture confirm numberic variable `varlist' if _rc { // string variable ... }

will treat every variable as a string variable. Better syntax would be

Code:

capture confirm numeric variable `varlist' if (_rc == 0) { // numeric variable ... } else if (_rc == 7) { // string variable ... } else { // unexpected error display as err "an unexpected error occurred" error _rc }

For simple problems, like the example here, it is usually not necessary to capture unexpected errors. Moreover, the problem above would show up in the most basic certification script, which I would recommend writing whenever you write a program, especially a program that you plan to share with others.
1 like
Comment
Maarten Buis

Join Date: Mar 2014

Posts: 3459
#6

21 Jul 2022, 09:03

daniel klein good catch

Additionally, I have been overthinking the syntax. The standard Stata syntax for when a variable is to create a new variable is to use an option called generate(). You use that, then the replace option makes immediate sense, and there is no need to nest that within an option

So the syntax line should be syntax varname [if] [in], GENerate(name) [replace]

Also note that you haven't used the if and in qualifiers. To do so replace the line gen `generate' = mod(`cnt',10)==0 with gen `generate' = mod(`cnt',10)==0 if `touse'

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment
Tom Moultrie

Join Date: Dec 2015

Posts: 33
#7

21 Jul 2022, 11:54

Thanks all! I think that that's pretty definitively solved! [I will submit some testbed data; and a help file; and a certification script in due course to SSC]. In the meantime, here is the cleaned up ado file with testbed data showing handling of a number of potential cases: erroneous spaces in the id; text in the id; insertion of leading zeroes into the id; and flagging of impossible ID numbers that pass the Luhn algorithm but nonetheless fail the determination of a South African ID number, the first 6 elements of which are YYMMDD of birth.

Attached Files

testbed.dta (2.5 KB, 1 view)

checkZAid.ado (2.5 KB, 1 view)
Comment

Announcement