Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Variables do not uniquely identify the observations in the master data

    Hi,

    I am trying to merge individual and household data. In the user manual, it has been mentioned that the datasets can be merged using sort variables like stateid, districtid, and many more. However, whenever I try to merge it 1:1 it says that 'variables do not uniquely identify the observations in the master data'.

    I first open the household data and then clear it. Then I open the individual data and merge it using one of the specified variables.

    Commands used: use "C:\Users\Hammu\Desktop\36151-0001-Data.dta"
    use "C:\Users\Hammu\Desktop\36151-0002-Data.dta", clear
    merge 1:1 IDHH using "C:\Users\Hammu\Desktop\36151-0002-Data.dta"
    Last edited by sladmin; 02 Oct 2020, 05:34. Reason: anonymize original poster

  • #2
    If I were to guess, I would think that there can be multiple individuals within a household. So a given household will be unique and have its own identifier, and the individual file could have multiple entries of that same identifier because there can be multiple individuals within a household. At least, that's how I've seen a lot of the Census data I've worked with. If that's the case you'll need a 1:m or m:1 depending on which dataset is the using dataset. Hope this helps.

    Comment


    • #3
      It is hard to guess the reason for this error. If you could post example dataset using "dataex" we could offer better help. As a start in your master dataset, look for duplicates using,
      Code:
       duplicates report IDHH
      Then check how many cases are duplicated. It is possible there is only one case that is duplicated by entry error, or there are some cases with missing. In any case, if the duplicates are few, you know it is not the format of the data, which is possible as indicated in #2, but just errors in data. You can then try to resolve those cases before merging.

      Comment


      • #4
        I think Justin's suggestion worked. On using the duplicate report, the surplus produced is zero!
        Thank you for the help guys!

        Comment

        Working...
        X