merging survey data files - duplicate data problem

Patricia Ali

Join Date: Mar 2019

Posts: 30
#1

merging survey data files - duplicate data problem

10 Apr 2019, 08:44

Dear All,

I am working on two survey datasets and have encountered the same problem in the small dataset. I am trying to merge 4 data files in the smaller survey and having a problem with duplicate variables. The data sets are on wheat, corn, barley and demography data. The demography and wheat file that am trying to merge have member id variables while the corn and barley only have the cluster and hh. This is a step by step explanation of what I did

use demography.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save demo.dta, replace

use wheat.dta, clear
gen qid = string (cluster) + string (hh)
gen qid2 = string (cluster) + string (hh) + string (memID)
save wheatO.dta, replace

use barley.dta, clear
gen qid = string (cluster) + string (hh)
save barleyO.dta, replace

use corn.dta, clear
gen Eid = string (cluster) + string (hh)
save cornO.dta, replace

when I tab crop variable I get
wheat = 320
barley= 663
corn= 422

Then I proceed to merge as follows:

use demo.dta, clear // memID
merge m:m qid2 using wheatO.dta
rename _merge MERGE
sort cluster hh memID
drop if merge !=3

tab crop and I get wheat = 320 (which is the same as before the merge=great)

save whdemo.dta

merge m:m qid using barleyO.dta
sort cluster hh memID
order MERGE, after (_merge)
drop if _merge !=3

tab crop
I get 320 wheat (great) but for barley I get 780 (which is way beyond the 663)

what am I doing wrong

Many thanks for your help in advance
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

10 Apr 2019, 09:46

Perhaps it is related to the "m:m" for merging, which tends to be an unfortunate idea. You may wish take a look at - joinby - for that matter.

Best regards,

Marcos
Comment
Patricia Ali

Join Date: Mar 2019

Posts: 30
#3

10 Apr 2019, 11:34

Dear Marcos,

Thank you for your advise.

I have tried the joinby command to merge file 1 and file 2.

joinby using "filename" and joinby using "filename", unmatched (both) both result in duplicated observation as in the case of merging m:m.

Any other suggestions?
Comment
Sarah Edgington

Join Date: Apr 2014

Posts: 284
#4

10 Apr 2019, 12:10

You'll get a lot more help if you can show some examples of your data (use dataex) and describe exactly what you want the merge to look like. It sounds like your problem is probably with the way you are conceptualizing your problem but since you don't provide either a clear example of what you have or a clear description of what you want it's impossible to help you get there.
It looks like you have multiple observations within your identifiers in your files. So you need to very clearly think through what a successful merge would look like. If you can describe that we can help you get there.
Comment
Patricia Ali

Join Date: Mar 2019

Posts: 30
#5

10 Apr 2019, 13:57

@Marcos, I noticed an error in my method and used joinby command and it worked perfectly this time around! THANKS ALOT!
@Sarah - thanks for your suggestion. As you can read, the joinby worked!
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#6

10 Apr 2019, 17:44

Thank you for the feedback and for informing your query reached a satisfactory closing.

Best regards,

Marcos
Comment

Announcement

merging survey data files - duplicate data problem

Comment

Comment

Comment

Comment

Comment