Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Household survey Datasets merging problems

    Hello everybody,

    I'm working on the 2009 socio economica Ghana panel survey for my final thesis. I'm having some problem merging 2 datasets.

    I would like to merge them using "hhno" and "hhmid" code, however the presence of some duplicates prevent me from doing that.
    Honestly, I would like to avoid to drop the duplicates as they contain relevant information for the purpose of my reserach.

    Whenever I want to merge the datasets, Stata gives me the r(459) error message variables "hhno" and"hhmid" do not uniquely identify observations in the master data.

    Is there a way to merge the data keeping the data of the duplicates?

    I enclosed a picture of the 2 datasets at issue.

    Thanks in advance
    Attached Files
    Last edited by Nico Giammarino; 21 Feb 2017, 03:31.

  • #2
    Well, in the data you show us, hhno and hhmid do appear to uniquely identify the observations in data set 1. So if you went

    [code]
    use data_set_1, clear
    merge 1:m hhno hhmid using data_set_2
    [code]
    you should get results. Assuming this is what you tried (you don't show us your code, so we can only guess at what you actually did), the error message means that somewhere in the data set, not where you have shown us, there are some hhno hhmid combinations that appear more than once. (I have never known Stata to be wrong when it says this.) So you need to find those and then decide what to do with them.

    Code:
    duplicates tag hhno hhmid, gen(flag)
    browse if flag
    will enable you to see these observations. Then you have to figure out what to do. Perhaps the observations are complete duplicates in all respects. In that case, duplicates drop will eliminate them and you can just proceed with your merge. If the observations are not complete duplicates but disagree on some variables, then you have to distinguish certain possibilities:

    1. They are supposed to be there. They represent different valid observations. In this case, something needs to be in the data set that distinguishes them so Stata can know which of these observations goes with which observations in data_set_2. Or it may be that you want each of these observations in data_set1 to be paired with all of the observations in data_set2 that match with them on hhno and hhmid. In that case, -merge- is the wrong command: see -help joinby-.

    2. There shouldn't be any such near duplicates and the conflicts between them need to be resolved in some way. This could involved determining which of the observations is correct and dropping the other(s), or it could mean combining the observations in some way that incorporates some of the information from each.

    All of that said, please don't ever post pictures of data sets again. Had it been necessary to use your example data in Stata to try out some code to answer your question, it would have been impossible to do so, short of typing it in by hand. In fact, I suspect the only reason you didn't get any reply to your post before this is that nobody was willing to deal with those images. The helpful way to show example data is with the -dataex- command. You can install that command by running -ssc install dataex-. Then run -help dataex- to read the instructions on how to use it. Always use -dataex- to show example data here. It makes it possible for those who want to help you to create a completely faithful Stata replica of your example with a simple copy/paste operation.

    Also, when stating that you are getting an error message, it is very difficult to get helpful advice if you don't show the actual code that led to the error message. While there are some common situations that are recognized easily, as in this case, in general this is no more helpful than going to a doctor and saying "I don't feel well." The details are all important.

    More generally, read the entire FAQ for good advice on how to post questions in such a way as to maximize your chances of getting a helpful response.

    Comment


    • #3
      Thanks for your kind reply.
      It was very helpful.
      You are totally right about the details.
      It was my first post: next time I will follow all the guidelines.

      Comment

      Working...
      X