How to detect data entry error: locating an observation that does not match any of the observations of a given set?

Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#1

How to detect data entry error: locating an observation that does not match any of the observations of a given set?

24 Dec 2020, 10:49

Hi Statalist,
Apologies for the vagueness of the question as I didn't know how to frame it better. Hopefully the details would convey the question better. Please consider the following example:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str4 var1 str8 var2 str6 var3 "hhid" "memberid" "headid" "111" "1" "3" "111" "2" "3" "111" "3" "3" "112" "1" "1" "112" "2" "1" "112" "3" "1" "112" "4" "1" "113" "1" "4" "113" "2" "4" "113" "3" "4" end

now as can be seen, for hhid 113, headid has been wrongly entered as 4, while none of the members of that hh have an id 4. I suspect something like this has happened in my data and I want to identify that hhid for which this anomaly occurs. My understanding is the code would try to find, within each hhid whehter headid belongs to the set containing memberids, and if a headid doesn't belong to the corresponding set, it would flag the hhid.
However, i have been unable to figure out how to write this particular code and would appreciate any help from the community.

Thanks,
Titir
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

24 Dec 2020, 12:04

This example should point you in a useful direction. I counts the number of individuals in each hhid for whom the memberid and headid are the same.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int hhid byte(memberid headid)
111 1 3
111 2 3
111 3 3
112 1 1
112 2 1
112 3 1
112 4 1
113 1 4
113 2 4
113 3 4
end
sort hhid memberid
by hhid: generate nhead = sum(memberid==headid)
by hhid: replace nhead = nhead[_N]
list if nhead!=1, sepby(hhid)

Code:

. list if nhead!=1, sepby(hhid)

     +----------------------------------+
     | hhid   memberid   headid   nhead |
     |----------------------------------|
  8. |  113          1        4       0 |
  9. |  113          2        4       0 |
 10. |  113          3        4       0 |
     +----------------------------------+

If this were my data, I would expand this code to look for households where headid is not the same for every member, and for households where the same memberid appears more than once, both of which could lead you to having a household with more than one head.

Comment

Titir Bhattacharya

Join Date: Mar 2019

Posts: 226
#3

26 Dec 2020, 21:30

Originally posted by William Lisowski View Post

This example should point you in a useful direction. I counts the number of individuals in each hhid for whom the memberid and headid are the same.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int hhid byte(memberid headid) 111 1 3 111 2 3 111 3 3 112 1 1 112 2 1 112 3 1 112 4 1 113 1 4 113 2 4 113 3 4 end sort hhid memberid by hhid: generate nhead = sum(memberid==headid) by hhid: replace nhead = nhead[_N] list if nhead!=1, sepby(hhid)

Code:

. list if nhead!=1, sepby(hhid) +----------------------------------+ | hhid memberid headid nhead | |----------------------------------| 8. | 113 1 4 0 | 9. | 113 2 4 0 | 10. | 113 3 4 0 | +----------------------------------+

If this were my data, I would expand this code to look for households where headid is not the same for every member, and for households where the same memberid appears more than once, both of which could lead you to having a household with more than one head.

Thank you so much William, for your response. I'll try what you have suggested.
Comment

Announcement

How to detect data entry error: locating an observation that does not match any of the observations of a given set?

Comment

Comment