Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace values using conditions

    Hi, this is my first post here and I will try to be precise.

    I have a dataset with duplicate start_date (the day survey starts captured by the survey software) and date_yesterday (manually entered by the enumerators), as seen in the screenshot below. I am focusing on duplicate date_yesterday. In the image below, it is seen that for hh_num == 21 date_yesterday== May24 four times. I think that it could be some other household since one household should ideally have one entry on a particular day. Moreover if we look at the cell_hhead which is the cell number of the head of the household, all the May24 entries have different cell numbers. The actual cell number of household 21 is 9311807027.

    If I just look at the first cell number of these duplicate dates:

    tab hh_num if cell_hhead == 7836801626

    hh_num | Freq. Percent Cum.
    ------------+-----------------------------------
    21 | 1 3.33 3.33
    29 | 1 3.33 6.67
    30 | 28 93.33 100.00
    ------------+-----------------------------------
    Total | 30 100.00

    It appears that this number could be of household 30. Moreover, household 30 does not have an entry on date_yesterday == May24.

    I want to know how I can change the hh_num to 30 from 21 for observation 514 since it is fulfilling two criterias: the phone number of obs 514 matches to that of household 30 and household 30 does not have a date_yesterday entry for May24. Since there are more cases like this I am looking for a general solution which replaces hh_num based on these two criterias.

    Thank you!
    Attached Files

  • #2
    It sounds like this should be an easy fix, however, it is still not entirely clear what you are hoping to do. How are the observations in your data uniquely identified? (i.e. what does each row represent?) Could you attach a subset of your data which includes all the relevant identifying variables?

    Am I correct in thinking that every hh_num should correspond to only one cell_hhead? E.g. every row with hh_num == 21 should have the same cell_hhead value?

    Comment


    • #3
      Originally posted by Nate Tillern View Post
      It sounds like this should be an easy fix, however, it is still not entirely clear what you are hoping to do. How are the observations in your data uniquely identified? (i.e. what does each row represent?) Could you attach a subset of your data which includes all the relevant identifying variables?

      Am I correct in thinking that every hh_num should correspond to only one cell_hhead? E.g. every row with hh_num == 21 should have the same cell_hhead value?
      Each observation tells us the data of variables like earnings and health of one household on a particular date.

      Ideally, every household should have one cell_hhead but there are cases when one household has multiple cell_hhead. In that case I just want to see that if there is a repeat date and the cell_hhead matches the cell_hhead of another household for which entry on that date is missing. Like in the example above.

      Attaching screenshot of data:
      Attached Files

      Comment


      • #4
        So hh_num and start_date alone should uniquely identify observations? But currently do not because some of the observations for a given hh_num have been assigned to another hh_num which can be identified by the variance in phone numbers?

        Are the number of dates consistent for each hh_num?

        Assuming what I've said above is true I can at least tell you how to start fixing this by identifying all the problem observations:

        Code:
        bysort hh_num start_date: gen cnt = _N
        sort cnt   // cnt should be a count variable that is equal to one if there is only one observation satisfying the conditions that should be uniquely identifying it, all observations with a cnt>1 are problematic
        Again, if you attach your data or a subset of it (not a screenshot, but a .dta or .csv file) then I'm happy to actually play around with fixing the issue.

        Comment

        Working...
        X