Hi, I am validating a dataset that contains some demographic info of our clients.
I need to check whether the values of each variable match the pre-defined data dictionary, and to display the incorrect values.
The dataset looks like:

The STATA code is like:
As can be seen, 87 in gender variable is incorrect. But I am getting the result as below:
> quietly{
=================================================
Value Checking
The following variables may contain data that
does not match NADA dictionary. Please check!
=================================================
dobstatus is good.
sexatbirth is good.
sexualorientation is good.
gender is good.
Please help. Thanks!
I need to check whether the values of each variable match the pre-defined data dictionary, and to display the incorrect values.
The dataset looks like:
The STATA code is like:
Code:
/* data dictionary */ quietly{ local dobstatus_v "1 2" local sexatbirth_v "1 2 95 99" local sexualorientation_v "1 2 3 5 96 97 99" local gender_v "1 2 99" } /// Identify if variable is NOT string type. check if their values matches data dictionary quietly{ noisily display "=================================================" noisily display " Value Checking " noisily display " The following variables may contain data that " noisily display " does not match NADA dictionary. Please check!" noisily display "=================================================" foreach var in dobstatus sexatbirth sexualorientation gender{ local checktype: type `var' /* use marco to store the variable type */ if substr("`checktype'", 1, 3) == "str" { /* check if the var is string type */ noisily display "*** `var' is still string type" /* Warning if the var is string type */ continue /* skip to next */ } /* Check variable against its data dictionary */ local valueList = "`var'_v" gen str_`var' = string(`var') if strpos(" ``valueList'' ", str_`var') == 0 { noisily list clientid `var' } else { noisily display "`var' is good." } drop str_`var' } }
> quietly{
=================================================
Value Checking
The following variables may contain data that
does not match NADA dictionary. Please check!
=================================================
dobstatus is good.
sexatbirth is good.
sexualorientation is good.
gender is good.
Please help. Thanks!
Comment