Hi, I am validating a dataset that contains some demographic info of our clients.
I need to check whether the values of each variable match the pre-defined data dictionary, and to display the incorrect values.
The dataset looks like:

The STATA code is like:
As can be seen, 87 in gender variable is incorrect. But I am getting the result as below:
> quietly{
=================================================
Value Checking
The following variables may contain data that
does not match NADA dictionary. Please check!
=================================================
dobstatus is good.
sexatbirth is good.
sexualorientation is good.
gender is good.
Please help. Thanks!
I need to check whether the values of each variable match the pre-defined data dictionary, and to display the incorrect values.
The dataset looks like:
The STATA code is like:
Code:
/* data dictionary */
quietly{
local dobstatus_v "1 2"
local sexatbirth_v "1 2 95 99"
local sexualorientation_v "1 2 3 5 96 97 99"
local gender_v "1 2 99"
}
/// Identify if variable is NOT string type. check if their values matches data dictionary
quietly{
noisily display "================================================="
noisily display " Value Checking "
noisily display " The following variables may contain data that "
noisily display " does not match NADA dictionary. Please check!"
noisily display "================================================="
foreach var in dobstatus sexatbirth sexualorientation gender{
local checktype: type `var' /* use marco to store the variable type */
if substr("`checktype'", 1, 3) == "str" { /* check if the var is string type */
noisily display "*** `var' is still string type" /* Warning if the var is string type */
continue /* skip to next */
}
/* Check variable against its data dictionary */
local valueList = "`var'_v"
gen str_`var' = string(`var')
if strpos(" ``valueList'' ", str_`var') == 0 {
noisily list clientid `var'
}
else {
noisily display "`var' is good."
}
drop str_`var'
}
}
> quietly{
=================================================
Value Checking
The following variables may contain data that
does not match NADA dictionary. Please check!
=================================================
dobstatus is good.
sexatbirth is good.
sexualorientation is good.
gender is good.
Please help. Thanks!

Comment