Hi, this is my first post here and I will try to be precise.
I have a dataset with duplicate start_date (the day survey starts captured by the survey software) and date_yesterday (manually entered by the enumerators), as seen in the screenshot below. I am focusing on duplicate date_yesterday. In the image below, it is seen that for hh_num == 21 date_yesterday== May24 four times. I think that it could be some other household since one household should ideally have one entry on a particular day. Moreover if we look at the cell_hhead which is the cell number of the head of the household, all the May24 entries have different cell numbers. The actual cell number of household 21 is 9311807027.
If I just look at the first cell number of these duplicate dates:
tab hh_num if cell_hhead == 7836801626
hh_num | Freq. Percent Cum.
------------+-----------------------------------
21 | 1 3.33 3.33
29 | 1 3.33 6.67
30 | 28 93.33 100.00
------------+-----------------------------------
Total | 30 100.00
It appears that this number could be of household 30. Moreover, household 30 does not have an entry on date_yesterday == May24.
I want to know how I can change the hh_num to 30 from 21 for observation 514 since it is fulfilling two criterias: the phone number of obs 514 matches to that of household 30 and household 30 does not have a date_yesterday entry for May24. Since there are more cases like this I am looking for a general solution which replaces hh_num based on these two criterias.
Thank you!
I have a dataset with duplicate start_date (the day survey starts captured by the survey software) and date_yesterday (manually entered by the enumerators), as seen in the screenshot below. I am focusing on duplicate date_yesterday. In the image below, it is seen that for hh_num == 21 date_yesterday== May24 four times. I think that it could be some other household since one household should ideally have one entry on a particular day. Moreover if we look at the cell_hhead which is the cell number of the head of the household, all the May24 entries have different cell numbers. The actual cell number of household 21 is 9311807027.
If I just look at the first cell number of these duplicate dates:
tab hh_num if cell_hhead == 7836801626
hh_num | Freq. Percent Cum.
------------+-----------------------------------
21 | 1 3.33 3.33
29 | 1 3.33 6.67
30 | 28 93.33 100.00
------------+-----------------------------------
Total | 30 100.00
It appears that this number could be of household 30. Moreover, household 30 does not have an entry on date_yesterday == May24.
I want to know how I can change the hh_num to 30 from 21 for observation 514 since it is fulfilling two criterias: the phone number of obs 514 matches to that of household 30 and household 30 does not have a date_yesterday entry for May24. Since there are more cases like this I am looking for a general solution which replaces hh_num based on these two criterias.
Thank you!

Comment