Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with merging two datasets

    Hello,

    I have a problem with merging two datasets.
    My main dataset consists of trade data, with the variables year, partneriso, reporteriso and further variables with 22,433 observations
    Code:
    1992 "ZAF" "BEN"
    1992 "NLD" "BEN"
    1992 "DEU" "BEN"
    1992 "IDN" "BEN"
    1992 "THA" "BEN"
    1992 "NLD" "BEN"
    1992 "ESP" "BEN"
    ...
    1992 "IDN" "GIN"
    1992 "BRN" "GIN"
    1992 "FIN" "GIN"
    1992 "PRY" "GIN"
    1992 "PRT" "GIN"
    1992 "NLD" "GIN"
    1992 "USA" "GIN"
    The other data set contains of reporteriso, partneriso and distcap with 448 observations
    Code:
    "BEN" "ABW"  8006.697
    "GIN" "ABW"  5940.813
    "BEN" "AFG"  7450.474
    "GIN" "AFG"  8955.442
    "BEN" "AGO"  2064.176
    "GIN" "AGO"  3804.649
    "BEN" "AIA"  7242.226
    "GIN" "AIA"   5192.66
    "BEN" "ALB" 4228.5376
    "GIN" "ALB"  4965.696
    My problem now is that i want to merge the second dataset into the first one, but i tried different combinations of merge 1:1, 1:m or m:1 but i couldn't get. I mostly got that the distance will only be merged for one pair of partneriso and reporteriso but not for all other observations.
    My code looks like that:

    sort partneriso reporteriso
    merge m:m partneriso reporteriso using "...", keep( master match)

    If I do that I only match if reporteriso and partneriso are "BEN" and "GIN" but for all other combinations i don't get any value.

    How do i get that the distance will be merged for every combination of partneriso and reporteriso?

    Thank you in advance for the help. Martin
    Last edited by Martin Weishaupt; 10 May 2019, 08:24.

  • #2
    Welcome to Statalist.

    First things first. Put merge m:m out of your mind. Even the Stata PDF documentation asserts that merge m:m is almost never appropriate.

    So it appears from your example data that the reporter is always BEN or GIN, and the partner is some other country, using ISO country codes. If indeed the three variables in your main dataset are year, partneriso, and reporteriso in the order that you show the data in your example, and the three variables in your "using" dataset are reporteriso, partneriso, and distcap again in the order that you show the data in your example, then
    Code:
    use main
    merge m:1 partneriso reporteriso using "...", keep( master match)
    should do what you need. But you don't show us what the results are whey you try this, so it is difficult ot know what was wrong.

    I have two guesses. First, your examples aren't as helpful as they could be because none of the partneriso values in your master dataset match values in your using dataset. A more useful set of example data would be

    Code:
    use main
    dataex if partneriso=="USA"
    use using, clear
    dataex if partneriso=="USA"
    and then post the entire output of each dataex example.

    Second guess is that we can't be sure that the names aren't wrong on one or the other of the datasets. Perhaps you have accidentally given the wrong names in one of the datasets. I think you used dataex, but then deleted lines you didn't think were useful.

    Comment


    • #3
      Thanks for the help already.
      My main data set looks like this.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      year tradeflow reportercode reporter reporteriso partnercode partner partneriso commoditycode commodity tradevalueus
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 27 "Mineral fuels, oils, distillation products, etc"         9610061
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 62 "Articles of apparel, accessories, not knit or crochet"      2163
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 44 "Wood and articles of wood, wood charcoal"                   2928
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 61 "Articles of apparel, accessories, knit or crochet"          3100
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 99 "Commodities not specified according to kind"              197539
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 23 "Residues, wastes of food industry, animal fodder"         210000
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 97 "Works of art, collectors pieces and antiques"              35024
      1992 "Import" 842 "USA" "USA" 204 "Benin"  "BEN"  1 "Live animals"                                             197210
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 71 "Pearls, precious stones, metals, coins, etc"             6194042
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 44 "Wood and articles of wood, wood charcoal"                  70075
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 99 "Commodities not specified according to kind"              378997
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 38 "Miscellaneous chemical products"                           60619
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 26 "Ores, slag and ash"                                    115794914
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  1 "Live animals"                                             116824
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  3 "Fish, crustaceans, molluscs, aquatic invertebrates ne"      9836
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 18 "Cocoa and cocoa preparations"                              24302
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 69 "Ceramic products"                                        1830266
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 63 "Other made textile articles, sets, worn clothing etc"       1600
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 84 "Nuclear reactors, boilers, machinery, etc"                  6376
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 15 "Animal,vegetable fats and oils, cleavage products, et"      9488
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 62 "Articles of apparel, accessories, not knit or crochet"      9789
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 97 "Works of art, collectors pieces and antiques"              80816
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  9 "Coffee, tea, mate and spices"                             497175
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 90 "Optical, photo, technical, medical, etc apparatus"          8225
      1992 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 64 "Footwear, gaiters and the like, parts thereof"             49533
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN"  1 "Live animals"                                             281697
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 99 "Commodities not specified according to kind"               49048
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 27 "Mineral fuels, oils, distillation products, etc"        15369360
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 91 "Clocks and watches and parts thereof"                       1240
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 15 "Animal,vegetable fats and oils, cleavage products, et"      5080
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 52 "Cotton"                                                  1200435
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 62 "Articles of apparel, accessories, not knit or crochet"      2241
      1993 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 97 "Works of art, collectors pieces and antiques"              22770
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 29 "Organic chemicals"                                        156098
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 39 "Plastics and articles thereof"                             27640
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 71 "Pearls, precious stones, metals, coins, etc"             5862944
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 13 "Lac, gums, resins, vegetable saps and extracts nes"       371387
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 97 "Works of art, collectors pieces and antiques"             130057
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 25 "Salt, sulphur, earth, stone, plaster, lime and cement"      5795
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  1 "Live animals"                                              57397
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 84 "Nuclear reactors, boilers, machinery, etc"                  7636
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 49 "Printed books, newspapers, pictures etc"                    2158
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 95 "Toys, games, sports requisites"                            40869
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 76 "Aluminium and articles thereof"                             1966
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 58 "Special woven or tufted fabric, lace, tapestry etc"          625
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 50 "Silk"                                                       2844
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 44 "Wood and articles of wood, wood charcoal"                  69262
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 27 "Mineral fuels, oils, distillation products, etc"         4836695
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 42 "Articles of leather, animal gut, harness, travel good"       661
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 83 "Miscellaneous articles of base metal"                      20254
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 73 "Articles of iron or steel"                                 42450
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 26 "Ores, slag and ash"                                    130314288
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 85 "Electrical, electronic equipment"                          20349
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 92 "Musical instruments, parts and accessories"                 4582
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 99 "Commodities not specified according to kind"              413985
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 32 "Tanning, dyeing extracts, tannins, derivs,pigments et"      9000
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 15 "Animal,vegetable fats and oils, cleavage products, et"      5608
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 52 "Cotton"                                                     1043
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 62 "Articles of apparel, accessories, not knit or crochet"      7419
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  9 "Coffee, tea, mate and spices"                             255431
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 96 "Miscellaneous manufactured articles"                       25888
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  3 "Fish, crustaceans, molluscs, aquatic invertebrates ne"      1748
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 28 "Inorganic chemicals, precious metal compound, isotope"   5229268
      1993 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 33 "Essential oils, perfumes, cosmetics, toileteries"           6450
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 84 "Nuclear reactors, boilers, machinery, etc"                100423
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 97 "Works of art, collectors pieces and antiques"              10010
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN"  1 "Live animals"                                             668999
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 52 "Cotton"                                                  1851110
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 71 "Pearls, precious stones, metals, coins, etc"                3750
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN"  3 "Fish, crustaceans, molluscs, aquatic invertebrates ne"     15324
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 23 "Residues, wastes of food industry, animal fodder"         124800
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 27 "Mineral fuels, oils, distillation products, etc"         8490257
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 60 "Knitted or crocheted fabric"                                1718
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 62 "Articles of apparel, accessories, not knit or crochet"      9229
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 99 "Commodities not specified according to kind"                2417
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 44 "Wood and articles of wood, wood charcoal"                  61854
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 42 "Articles of leather, animal gut, harness, travel good"      2000
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 33 "Essential oils, perfumes, cosmetics, toileteries"           1550
      1994 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 94 "Furniture, lighting, signs, prefabricated buildings"        3033
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 26 "Ores, slag and ash"                                    114197608
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 84 "Nuclear reactors, boilers, machinery, etc"                 39285
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  3 "Fish, crustaceans, molluscs, aquatic invertebrates ne"     70289
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 63 "Other made textile articles, sets, worn clothing etc"      16919
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 44 "Wood and articles of wood, wood charcoal"                  21323
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  1 "Live animals"                                               5212
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 71 "Pearls, precious stones, metals, coins, etc"             2054854
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 62 "Articles of apparel, accessories, not knit or crochet"     17787
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 97 "Works of art, collectors pieces and antiques"              38157
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 11 "Milling products, malt, starches, inulin, wheat glute"      3992
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 92 "Musical instruments, parts and accessories"                 2274
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN"  9 "Coffee, tea, mate and spices"                             229282
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 94 "Furniture, lighting, signs, prefabricated buildings"        6708
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 52 "Cotton"                                                    72260
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 39 "Plastics and articles thereof"                               630
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 15 "Animal,vegetable fats and oils, cleavage products, et"     33581
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 69 "Ceramic products"                                           1827
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 90 "Optical, photo, technical, medical, etc apparatus"        119241
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 99 "Commodities not specified according to kind"             2264384
      1994 "Import" 842 "USA" "USA" 324 "Guinea" "GIN" 88 "Aircraft, spacecraft, and parts thereof"                    2625
      1995 "Import" 842 "USA" "USA" 204 "Benin"  "BEN" 52 "Cotton"                                                  1199206
      end
      And this is the data set for the distance:

      Code:
      input str3(reporteriso partneriso) float distcap
      "BEN" "USA" 8679.125
      "GIN" "USA"  6877.18

      I used this code:

      Code:
      use main.dta
      merge m:1 partneriso reporteriso using "...", keep(master match)
      But this only matches if "GIN" and "BIN" are both reporteriso and partneriso. So i get 148 matches and 22,285 are unmatched.

      Hope the data part got a little bit more clear and thank you for your help already in advance.


      Comment


      • #4
        In the main dataset USA is the reporter, BEN and GIN are the partner.

        In the distance dataset USA is the partner, BEN and GIN are the reporter.

        So my second guess in post #2 was the correct one.

        The merge command matches observations where the partneriso is the same in both observations, and the reporteriso is the same in both observations. It does not match observations where the reporteriso in one observation matches the partneriso in the other, and vice versa.

        Perhaps in your dataset for distance you should
        Code:
        use distancedata
        rename ( reporteriso partneriso ) ( partneriso reporteriso )
        so that BEN and GIN will be the partner in both datasets.

        Or, have two observations for each distance, so it doesn't matter which is the partner and which is the reporter in the master dataset.
        Code:
        cls
        // set up sample data
        clear
        input str3(reporteriso partneriso) float distcap
        "BEN" "USA" 8679.125
        "GIN" "USA"  6877.18
        end
        tempfile distances
        save `distances'
        
        // demonstrate the code on sample data
        use `distances', clear
        rename ( reporteriso partneriso ) ( partneriso reporteriso )
        append using `distances'
        list, clean noobs abbreviate(12)
        save alldistances, replace
        Code:
        . list, clean noobs abbreviate(12)
        
            partneriso   reporteriso    distcap  
                   BEN           USA   8679.125  
                   GIN           USA    6877.18  
                   USA           BEN   8679.125  
                   USA           GIN    6877.18

        Comment


        • #5
          Ah thank you very much. After I renamed it, it workes perfectly. .

          Comment

          Working...
          X