Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to include Respective variables from other datasets??

    Hello,

    First of all, let me state that I am quite novice in STATA.

    I am trying to estimate trade potentials by using the Gravity model of International Trade. Model includes trade flows between countries, their GDPs, distance, and more than 0 dummy variables like whether countries were in common colony, share border etc. I have three different datasets, all are from CEPII, for trade flows, for country name in ISO code, and the third one for all other variables.

    The first dataset is as below; (t - time, hs6- HS6 code, i and j - iso numeric codes for importer and exporter, respectively and v is value of export)

    t hs6 i j v
    2013 10600 4 124 1.622
    2013 10600 4 586 36.676
    2013 20110 4 634 2.434
    2013 20130 4 634 30.128
    2013 20319 4 616 3.83633
    2013 20421 4 634 100.953
    2013 30342 4 704 264.611
    2013 30343 4 704 42.3159

    And second looks like as; ( here iso3 and iso2 are ALpha2 and 3 Iso codes, and i is, again, iso numeric code)

    iso3 iso2 name_english i
    AFG AF Afghanistan 4
    ALB AL Albania 8
    DZA DZ Algeria 12
    ASM AS American Samoa 16
    AND AD Andorra 20
    AGO AO Angola 24
    ATG AG Antigua and Barbuda 28
    And the third dataset is as ( you may guess the variables, hopefully)

    iso3_o iso3_d year contig comlang_off comlang_comcol col45 distw pop_o gdp_o gdpcap_o iso2_o
    ABW AFG 1991 0 0 0 0 0 13168.22 872.067 AW
    ABW AFG 1992 0 0 0 0 0 13168.22 958.6592 AW
    ABW AFG 1993 0 0 0 0 0 13168.22 1083.24 AW
    ABW AFG 1994 0 0 0 0 0 13168.22 1245.81 AW
    ABW AFG 1995 0 0 0 0 0 13168.22 1320.67 AW
    ABW AFG 1996 0 0 0 0 0 13168.22 1379.888 AW
    ABW AFG 1997 0 0 0 0 0 13168.22 1531.844 AW


    The above dataset has 38 variables and about 3 mln observation. Some of the variables and majority of the observations here are not in my interest and I will not include them in my model.

    Now my question is like this. How can I include Respective iso3 codes and names from second dataset and distance between that pair of countries, border, common language and other dummies from the third dataset to my first dataset?


    Hopefully I was able to state my problem clearly. I have searched for this a lot but, unfortunately, have not found any answer yet. Any help would be highly appreciated..





  • #2
    Code:
    // attach the iso3_o variable
    use dataset2
    rename iso3 iso3_o
    merge 1:m i using dataset1
    // look carefully at the _merge variable
    drop _merge
    save temp
    
    // attach the iso3_d variable
    use dataset2
    rename iso_3 iso3_d
    rename i j
    merge 1:m j using temp
    // look carefully at the _merge variable
    drop _merge
    save temp
    
    // attach the other variables
    use temp
    rename t year
    merge 1:1 iso_o iso_d year using dataset3, keepusing(the variables you want to keep)
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Maarten,

      Thank you for your prompt response. It seems im near to solution.

      Everything goes smooth here until "merge 1:m i using dataset1". After this i see error msg as "variable i does not uniquely identify observations in the master data".

      I asked hyped help merge and tried other options. Only m:m worked among the options, but everything mixed here now (pls see attached file). Is it correct option to choose or?


      Attached Files

      Comment


      • #4
        I think I'm going to start calling merge m:m a shotgun merge. It's for people who really really really want to merge but can't get 1:1, 1:m, or m:1 to work so they get the shotgun out and make it happen. They know it's probably not right but hey, there's no error message so it's better than nothing. Wrong! It just makes no sense to do a m:m merge and therefore the merge results are senseless.

        In Maarten's sample code, the purpose of the first merge is to add country-level data (i.e. 3-letter country codes) to observations in dataset1 when the value of the key variable i (numeric country code) matches. The error message you encountered indicates that there are duplicate numeric country codes in the data in memory (i.e. the master, in this case dataset2). In other words, your master list of country codes has multiple observations for the same code. To check for this
        Code:
        use dataset2
        duplicates tag i, gen(check)
        sort i
        list if check, sepby(i)
        Once you have removed duplicates, your should check that your country codes uniquely identify observations
        Code:
        isid i
        isid iso3
        You should also heed Maarten's advice to check the _merge variable after the merge. If dataset1 does not include data for some countries in dataset2, then the merged data will include extra observations from the master and you will most likely want to drop them using

        Code:
        drop if _merge == 1
        Same goes if you have observations in dataset1 with iso numeric codes not found in dataset2.


        Comment


        • #5
          Dear Robert,

          Thank you for your reply. It did work and i was alble to come until Mr Marteen's

          "merge 1:1 iso_o iso_d year using dataset3, keepusing(the variables you want to keep)"

          Here it again shows the same error msg: "Variable iso3_o does not uniquely.. " I again checked for duplicates and listed them. But now data editor shows that i have, for example, these much of duplicates for iso3_d and iso3_o

          check_desti check_orign
          45 1601
          45 1601
          45 1601
          68 1601
          68 1601
          68 1601
          583 1601

          What would you advice to solve this problem?

          Very interesting thing is this that i am not able to find an answer for this problem on forums/youtube/websites.

          Comment


          • #6
            Is it possible that the same origin destination pair shows up multiple times in the same year? If it is, e.g. you are looking at different products being imported or exported, than you need to look at a 1:m or m:1 merge instead of a 1:1 merge. If this should not happen you need to look in your data and see what the problem is and fix it. It is hard for us to say how to fix it, you just need to do something that is right, or at least defendable, for your data and problem. This is where the "art of data analysis" comes in.
            ---------------------------------
            Maarten L. Buis
            University of Konstanz
            Department of history and sociology
            box 40
            78457 Konstanz
            Germany
            http://www.maartenbuis.nl
            ---------------------------------

            Comment


            • #7
              Dear Maarten,

              Yes, my dataset was including many products for pair of countries, indeed all HS6 digit level products. I did follow your advice and used m:1 option. Thankfully, it worked. But now there is one thing that I am concerning : in the _merge section, except matched and master, there is using (=2) appear as well. I didn't know how to deal with them so dropped them as (following Mr Roberts's advice):
              [INDENT=2]drop if _merge == 2
              [/INDENT] Could you say, is it right thing to do, or I need to apply different solution?




              Comment


              • #8
                That is really a substantive question. Does it make sense that these country-country-year combinations exist in the master file but not in the using file? Sometimes the answer is yes and sometimes the answer is no. So you need to look at each of these country-county-year combinations and decide based on your substantive knowledge of the problem.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment

                Working...
                X