Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging Multiple Obs. Per Year Dataset with Coutnty-Year Dataset

    Hello Statalisters,


    I apologize if it is a question already been asked (and solved, hopefully). I have two datasets that I want to merge, but I have a problem sprouting mainly from the way the data is structured.

    In one data set I have, what most datasets look like:

    Country year score
    X 1998 5
    X 1999 4
    Y 1998 5
    Y 1999 3

    and so forth. The other dataset, however, has multiple observations per year.

    Country Year aid agency
    X 1999 2000 UN
    X 1999 3000 World Bank
    X 1999 3500 IMF


    The question now is how do I merge the two datasets? Since I cannot xtset the latter dataset, the conventional way of merging datasets failed me.

    Thanks for your time,
    BA

  • #2
    Code:
    use data_set1, clear
    merge 1:m Country Year using data_set2
    This will give you data that have multiple observations for the same score, so be careful depending on what you are doing with that.

    If you wanted to reshape the data post-merge that would be possible as well, and then you would be able to xtset again.

    For example say you had a variable in data_set2 for aid amount and that is what you are interested in.
    In that case you could do:
    Code:
    reshape wide aid_amount, i(Country Year) j(aid_agency)
    This would give you Country-Year observations and variables like aid_amount_IMF, aid_amount_World_Bank, aid_amount_UN that would show the amount of aid for that agency in a given Country-Year.
    Last edited by Eric Haavind-Berman; 12 Dec 2016, 14:05.

    Comment


    • #3
      Please give data examples using dataex (SSC) as made explicit in the FAQ Advice, which you were asked to read. Your examples here are helpful, but need some engineering work to turn them into usable code.

      Even if you are very unsure about syntax, you should be able to tell us what you think the resulting file should look like.

      This works but is dependent on consistent variable names (here country year), which is not true of your example. You will need to tell us if you want something different.

      Code:
      clear
      input str1 country year score
      X 1998 5
      X 1999 4
      Y 1998 5
      Y 1999 3
      end
      save whateveryouwant , replace
      
      clear
      input str1 country year aid str10 agency
      X 1999 2000 "UN"
      X 1999 3000 "World Bank"
      X 1999 3500 "IMF"
      end
      
      merge m:1 country year using whateveryouwant
      
      sort country year
      list, sepby(country)
      
      
           +-------------------------------------------------------------+
           | country   year    aid       agency   score           _merge |
           |-------------------------------------------------------------|
        1. |       X   1998      .                    5   using only (2) |
        2. |       X   1999   3500          IMF       4      matched (3) |
        3. |       X   1999   3000   World Bank       4      matched (3) |
        4. |       X   1999   2000           UN       4      matched (3) |
           |-------------------------------------------------------------|
        5. |       Y   1998      .                    5   using only (2) |
        6. |       Y   1999      .                    3   using only (2) |
           +-------------------------------------------------------------+

      Comment

      Working...
      X