Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging monadic and dyadic dataset

    Hi all,

    Apologies if this seems trivial, but I can assure you that I've spent two days solidly trying to sort this out independently, with no luck.

    I've got a dataset concerning foreign aid flows, with each observation containing a recipient/donor pair, year + other information about the type of aid etc. Each dyad pair appears only once for each year within my temporal range, but individually each recipient and each donor appear in many observations for each year. Below is a visual example.

    Year---recipient (str)---recipientcode---donor (str)---donorcode---dyadid------value...
    2004---Ghana----------------234------------USA-------------12------------1661-------0.325...
    2004---Benin----------------234------------Belgium----------18------------1300-------0.515...
    2005---Ghana----------------234------------USA--------------12------------1661------0.250...
    2005---Benin-----------------234------------Belgium----------18------------1300------0.850...
    2006---Ghana-----------------234------------USA-------------12------------1661-------0.015...
    2006---Benin-----------------234------------Belgium----------18------------1300------0.210...


    So each observation is uniquely defined by the year, donor and recipient together, as individually (or as a pair) these variables are duplicated many times. I need to merge this dyadic data with a dataset containing monadic information about the recipient countries. These variables include a dummy for whether the recipient sat on the UN Security Council in a given year, quality of governance, number of natural disasters etc. This dataset contains one observation per recipient country per year, and does not contain the donor countries at all. e.g:

    Year----------Country------scmember------coruptioncontrol------politicalviolence...
    2004----------Ghana-------------0----------------------1.3-----------------------1.8--------...
    2004----------Benin--------------1----------------------0.3------------------------1.4-------...
    2005----------Ghana-------------0----------------------1.4------------------------1.7-----...
    2005----------Benin--------------1----------------------0.25-----------------------1.1------...

    Every time I try to merge, I'm told that the variables I've specified don't uniquely identify observations in the master data. What I really want Stata to do is import the monadic information about the recipient for each year, and insert the relevant information into every observation containing the correct recipient/year combination. Is this possible, and what steps do I need to take in order to obtain this result?

    Any help will be much appreciated, as I've been stuck on this for a really long time. Thanks

  • #2
    Code:
    rename recipient country
    merge year country using "..second file..."
    Every time I try to merge
    where is your merge syntax?

    Sergiy Radyakin

    Comment


    • #3
      Originally posted by Sergiy Radyakin View Post


      where is your merge syntax?

      Sergiy Radyakin
      I hadn't realised merge by itself was an option. I had been trying to use merge 1:m and merge m:1 depending on which way I was attempting to merge the datasets. All the literature I'd consulted online detailed 1:m/m:1 merges, as does the Stata 13 manual. I hadn't considered looking up old syntax. Thanks very much, I can't believe how simple that was compared to how long i've been agonising over not being able to make it work. I'm pretty comfortable using Stata for statistical analysis, but I've never used it to build my own dataset before, hence my unfamiliarity with the command.

      thanks again, enormous help.

      Comment


      • #4
        I believe that, when you were trying the current syntax, you wanted a merge m:1 with the dyadic data in memory and "using" the monadic data from disk (multiple dyadic observations match each monadic observation), of course first using the rename that Sergly identified as necessary. Not sure why this didn't work for you (you said you'd tried it) since we don't see the code from your attempts.

        Comment


        • #5
          Well the old syntax sorted out my problem, but in the interest of understanding what was going wrong with the new syntax, as well as helping others in the future who may encounter similar problems:

          Code:


          use dyadic.dta
          sort year recipient
          merge m:1 year recipient using governance.dta

          (note: variable year was int, now float to accommodate using data's values)
          variables year recipient do not uniquely identify observations in the using data
          r(459);

          Nothing merged, and there's no _merge variable in the master. Both datasets are correctly sorted. Variables year and recipient have the same name in both datasets. The code is red in the Command window. As I say, the old syntax worked well.

          Comment


          • #6
            The Stata error message
            variables year recipient do not uniquely identify observations in the using data
            contradicts your assertion that the monadic data
            contains one observation per recipient country per year
            See help duplicates for tools to assist in dealing with duplicated observations. I would be concerned about the results of your merge using the old syntax.

            Comment


            • #7
              Thanks very much. There were five observations tacked onto the end of the dataset with missing values recorded on every variable, which were reported as duplicates. When I dropped those observations and ran m:1 merge it was successful.

              Comment

              Working...
              X