Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • merging survey data files - duplicate data problem

    Dear All,

    I am working on two survey datasets and have encountered the same problem in the small dataset. I am trying to merge 4 data files in the smaller survey and having a problem with duplicate variables. The data sets are on wheat, corn, barley and demography data. The demography and wheat file that am trying to merge have member id variables while the corn and barley only have the cluster and hh. This is a step by step explanation of what I did

    use demography.dta, clear
    gen qid = string (cluster) + string (hh)
    gen qid2 = string (cluster) + string (hh) + string (memID)
    save demo.dta, replace

    use wheat.dta, clear
    gen qid = string (cluster) + string (hh)
    gen qid2 = string (cluster) + string (hh) + string (memID)
    save wheatO.dta, replace

    use barley.dta, clear
    gen qid = string (cluster) + string (hh)
    save barleyO.dta, replace

    use corn.dta, clear
    gen Eid = string (cluster) + string (hh)
    save cornO.dta, replace

    when I tab crop variable I get
    wheat = 320
    barley= 663
    corn= 422

    Then I proceed to merge as follows:

    use demo.dta, clear // memID
    merge m:m qid2 using wheatO.dta
    rename _merge MERGE
    sort cluster hh memID
    drop if merge !=3

    tab crop and I get wheat = 320 (which is the same as before the merge=great)

    save whdemo.dta

    merge m:m qid using barleyO.dta
    sort cluster hh memID
    order MERGE, after (_merge)
    drop if _merge !=3

    tab crop
    I get 320 wheat (great) but for barley I get 780 (which is way beyond the 663)

    what am I doing wrong

    Many thanks for your help in advance



  • #2
    Perhaps it is related to the "m:m" for merging, which tends to be an unfortunate idea. You may wish take a look at - joinby - for that matter.
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos,

      Thank you for your advise.

      I have tried the joinby command to merge file 1 and file 2.

      joinby using "filename" and joinby using "filename", unmatched (both) both result in duplicated observation as in the case of merging m:m.

      Any other suggestions?

      Comment


      • #4
        You'll get a lot more help if you can show some examples of your data (use dataex) and describe exactly what you want the merge to look like. It sounds like your problem is probably with the way you are conceptualizing your problem but since you don't provide either a clear example of what you have or a clear description of what you want it's impossible to help you get there.
        It looks like you have multiple observations within your identifiers in your files. So you need to very clearly think through what a successful merge would look like. If you can describe that we can help you get there.

        Comment


        • #5
          @Marcos, I noticed an error in my method and used joinby command and it worked perfectly this time around! THANKS ALOT!
          @Sarah - thanks for your suggestion. As you can read, the joinby worked!

          Comment


          • #6
            Thank you for the feedback and for informing your query reached a satisfactory closing.
            Best regards,

            Marcos

            Comment

            Working...
            X