Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merge two data sets - error: no unique identifier

    Hello everyone,

    I have the following problem and would kindly appreciate your help:

    I have 2 datasets I'd like to merge.

    The first one contains the stock of migrants coming from one country and staying in another in a particular year, and some individual characteristics on this stock (gender and level of education, which can be low, medium or high). It thus looks like that:

    destination, origin, code of country of destination, code of country of origin, gender, year (I have data from 1980 to 2010, in 5-year intervals), low (number of migrants with low level of education), medium, high, total (adds up migrants).

    The second one contains data on bilateral trade agreements and their provisions. And it looks like: Importer code, exporter code, year, agreement, year of entry in force, type (FTA, PTA, CU,...) and provisions wich are coded with dummies depending on whether the agreement cotanins such a provision or not. I then recode importer and exporter into country of destination and origin.

    I'd like to merge the two datasets based on country of origin, country of destination and year but I find that in both cases there's no unique identifier. I understand why that is (in the first case, using these 3 variables, there's one case for males and one for females). In the second database, I again face a similar situation, using ExporterCode, ImporterCode and year I don't have unique observations because there are different agreements that concern the same pair of countries in the same year.

    Could anyone suggest me a way to overcome this problem?

    Many thanks

    Julia
    Last edited by Julia Brunet; 23 Oct 2019, 01:55.

  • #2
    I forgot: what I have tried so far is:

    //Merge
    use iabbd_8010_v1.dta, clear
    sort ccode_origin ccode_destination year
    save iabbd_8010_v1.dta, replace

    use Horizontal_Depth_DB.dta, clear
    rename ExporterCode ccode_origin
    rename ImporterCode ccode_destination
    sort ccode_origin ccode_destination year

    merge 1:m ccode_origin ccode_destination year using iabbd_8010_v1.dta

    Comment


    • #3
      It is difficult to determine how you want the merged data set to look like. You can start by copying and pasting the output from the following. Your explanations may make sense once we see your variables.

      Code:
      use iabbd_8010_v1.dta, clear
      sort ccode_origin ccode_destination year
      dataex in 1/10
      use Horizontal_Depth_DB.dta, clear
      rename ExporterCode ccode_origin
      rename ImporterCode ccode_destination
      sort ccode_origin ccode_destination year
      dataex in 1/10
      If you are running an older version of Stata which does not have the dataex command, first

      Code:
      ssc install dataex

      Comment


      • #4
        Hello Andrew,

        many thanks.

        For the first dataset:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str14 destination str32 origin str3 ccode_destination str7    ccode_origin    str6    gender    int    year    long(tot    low    med    high)
        "Australia" "Unknown" "AUS" "" "Male"   1980   7745   2049  1713    3983
        "Australia" "Unknown" "AUS" "" "Female" 1980   7087   2471  1454    3162
        "Australia" "Unknown" "AUS" "" "Male"   1985   9291   2346  2035    4910
        "Australia" "Unknown" "AUS" "" "Female" 1985   8422   2775  1626    4021
        "Australia" "Unknown" "AUS" "" "Female" 1990 106963  94633  6340    5990
        "Australia" "Unknown" "AUS" "" "Male"   1990 105539  92874  5924    6741
        "Australia" "Unknown" "AUS" "" "Male"   1995   1528    425   297    806
        "Australia" "Unknown" "AUS" "" "Female" 1995    853    362   207    284
        "Australia" "Unknown" "AUS" "" "Male"   2000 297223 226489 35770    34964
        "Australia" "Unknown" "AUS" "" "Female" 2000 304111 233670 35596    34845
        end

        For the second one:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str3(ccode_destination ccode_origin) int year str91 agreement int entry_force str9 type byte(wto_x_labourmarketregulation wto_x_visaandasylum_le) double(yaerinto dup)
        "BDI" "AGO" 1994 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 1995 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 1996 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 1997 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 1998 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 1999 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 2000 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 2001 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 2002 "COMESA" 1994 "CU" 1 0 -1993 0
        "BDI" "AGO" 2003 "COMESA" 1994 "CU" 1 0 -1993 0
        end

        Even though it can't be seen in the second dataset, for a lot of the country-pair combinations, there are often more than one trade agreement for that year. Eg. BDI and AGO in 1994 sign or are concerned by trade agreement 1 but also by trade agreement 2.

        Thanks

        Comment


        • #5
          Sorry, I pasted it in the wrong format. Again, dataset 2:


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str3(ccode_destination ccode_origin) int year str91 agreement    str9    type    byte(wto_x_labourmarketregulation    wto_x_visaandasylum_le)    double    dup
          "BDI" "AGO" 1994 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 1995 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 1996 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 1997 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 1998 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 1999 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 2000 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 2001 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 2002 "COMESA" "CU" 1 0 0
          "BDI" "AGO" 2003 "COMESA" "CU" 1 0 0
          end

          Comment


          • #6
            Thanks for the data example. I think you want

            Code:
            use Horizontal_Depth_DB.dta, clear
            rename ExporterCode ccode_origin
            rename ImporterCode ccode_destination
            sort ccode_origin ccode_destination year
            save Horizontal_Depth_DB.dta, replace
            
            use iabbd_8010_v1.dta, clear
            sort ccode_origin ccode_destination year
            joinby ccode_origin ccode_destination year using Horizontal_Depth_DB.dta

            Comment


            • #7
              Hello Andrew,

              yes, in the end I figured out myself too that I needed joinby instead of merge to generate all possible pairwise combinations.


              Again, thanks for taking the time to help me.

              Best,

              Julia

              Comment

              Working...
              X