Matching datasets for clean share and dirty share of bilateral exports

Tony Ekere

Join Date: Aug 2022

Posts: 12
#1

Matching datasets for clean share and dirty share of bilateral exports

13 Aug 2022, 06:16

Hi all,

I'm new to stata and I have been "learning by doing". This forum has been of utmost help. I'm currently working on a panel data, trying to calculate the clean and dirty share of bilateral exports as dependent variables. I am working with 197 countries for 37 years. To achieve this, I first generated a master dataset of total bilateral export containing 595,726 observations. Afterwards, I generated a dataset of exports in "dirty goods" with 2,137,764 observations and another dataset of exports in "clean goods" containing 8,576,162 observations. (The reason for the disparity in the number of observations is because the data from sectoral export is more detailed) N/B: Data from UN Comtrade using HS (For total and clean sectors) and SITC codes (For dirty sectors)

To calculate the clean and dirty share I should sum up sectoral exports and relate them to total exports. I have been on this for a while. The closest I came to getting it right was from a previous reply by Clyde Schechter. I proceeded as follows:

For the "clean" and "dirty" export dataset
1. by reportercode partnercode year, sort: egen cleanvalues = total (values)
2. by reportercode partnercode year, sort: egen dirtyvalues = total (values)

For the total export dataset
1. by reportercode year, sort: egen totalvalues = total(values)

Thereafter, I could only merge each (dirty and clean to the total) of them using a 1:1 by observation, such that all observations in the master dataset were matched and i dropped all those that were not matched. I then took the log of total values, cleanvalues and dirtyvalues before calculating the share by

1. gen cleanshare = ln_cleanvalues/ln_totalvalues
2. gen dirtyshare = ln_dirtyvalues/ln_totalvalues

For the cleanshare, I get a summary of (0 min, 0.6546 mean, 0.125sd and 1.21 max)
For the dirtyshare, I get a summary of (0 min, 0.6541 mean, 0.157sd and 1.45 max)

Concerns
1. The share should bound between 0 and 1. The Maximum values are above 1 in both cases
2. The mean values seem overstated. They should at most equate 1. In this instance, it should be a lot less than 1 as a lot of other sectors were not covered. Calculating manually, the sum of clean over sum of total = 0.038 and that of dirty = 0.089

QUESTIONS
1. Please did I do something wrongly?
2. Please how can I match these correctly?

N/b: Please I would appreciate urgent feedback as I am pressed for time and need to move on to other variables

Many Thanks.

Last edited by Tony Ekere; 13 Aug 2022, 06:34.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

13 Aug 2022, 10:05

From what little information you have provided, I can tell you that your two concerns

1. The share should bound between 0 and 1. The Maximum values are above 1 in both cases
2. The mean values seem overstated. They should at most equate 1. In this instance, it should be a lot less than 1 as a lot of other sectors were not covered. Calculating manually, the sum of clean over sum of total = 0.038 and that of dirty = 0.089

are both a result of

I then took the log of total values, cleanvalues and dirtyvalues before calculating the share by

1. gen cleanshare = ln_cleanvalues/ln_totalvalues
2. gen dirtyshare = ln_dirtyvalues/ln_totalvalues

You should not take the log of the values before calculating the shares.

Code:

gen cleanshare = cleanvalues/totalvalues gen dirtyshare = dirtyvalues/totalvalues

And note that the calculation you did manually did not involve taking the logs, which is certainly why the code you used did not produce the same results as the calculation done by hand.

The summarization and merging that preceded your calculations of shares seems likely to be incorrect, although with no example data and no code provided for that part of your work, it is difficult to give concrete advice. I am particularly concerned about

Thereafter, I could only merge each (dirty and clean to the total) of them using a 1:1 by observation, such that all observations in the master dataset were matched and i dropped all those that were not matched.

Perhaps the following example code will point you in a useful direction.

Code:

use clean, clear collapse (sum) cleanvalues=values, by(reportercode year) save sumclean, replace use dirty, clear collapse (sum) dirtyvalues=values, by(reportercode year) save sumdirty, replace use total, clear collapse (sum) totalvalues=values, by(reportercode year) merge 1:1 reportercode year using sumclean, generate(_merge_c) merge 1:1 reportercode year using sumdirty, generate(_merge_d) generate cleanshare = cleanvalues/totalvalues generate dirtyshare = dirtyvalues/totalvalues

Finally,

N/b: Please I would appreciate urgent feedback as I am pressed for time and need to move on to other variables

suggests that your problem should receive preferential attention over those submitted by others. Did you really mean to present yourself with that sense of entitlement? It certainly reduces the incentive to respond, especially when presented at the end of a post that does not acknowledge the advice presented in the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, with particular attention to sections 9-12 on how to best pose your question.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#3

13 Aug 2022, 10:13

I don't understand why you are taking the logarithms of the totals. And even if there is a good reason for that, this does not seem to be one of the situations where the ratio of two logarithms would be meaningful. It certainly would not represent a share or proportion of a total.

I could only merge each (dirty and clean to the total) of them using a 1:1 by observation

I do not know what you mean when you say 1:1 by observation. Stata does have -merge 1:1 _n-, but from your description of the data sets it would greatly surprise me if this kind of merge is appropriate for them. From your description, I would expect that -merge 1:1 reportercode partnercode year- would be appropriate.

If you want more specific help you need to the exact commands you used to merge the data sets and also show example data from all three data sets. Please choose examples for which there will be some matchable and some unmatchable observations. And use -dataex- to show them. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Added: Crossed with #2 which raises the same points and suggestions, with more specific advice.
1 like
Comment

Tony Ekere

Join Date: Aug 2022
Posts: 12

13 Aug 2022, 19:49

Thanks so much William Lisowski and Clyde Schechter for your very helpful feedback. I am honestly sorry for being insensitive to others. I did not use the right words and I apologize unreservedly.

I tried to take the log of exports to reduce the figures to manageable proportions (For the total exports especially). Thanks for informing me that I do not need to do that to calculate the shares. Yes, I merged using merge 1:1 _n using... I initially tried using -merge 1:1 reportercode partnercode year- but I kept getting this message: variables year reportercode partnercode do not uniquely identify observations in the master data

I am currently using Stata 14.0 on windows. I have just successfully gotten _dataex_. Thanks for leading me through it. Here are examples of the datasets

For Total Exports

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 classification int year str6 tradeflow int reportercode str32 reporter str3 reporteriso int partnercode str44 partner str3 partneriso str5 commoditycode double tradevalueus byte pick
"H0" 1993 "Export" 410 "Rep. of Korea"       "KOR" 422 "Lebanon"               "LBN" "TOTAL"  72211472 1
"H1" 1997 "Export" 642 "Romania"             "ROU" 434 "Libya"                 "LBY" "TOTAL"   9410000 1
"H0" 1998 "Export" 818 "Egypt"               "EGY"  51 "Armenia"               "ARM" "TOTAL"    250483 1
"H0" 1998 "Export" 608 "Philippines"         "PHL" 598 "Papua New Guinea"      "PNG" "TOTAL"   1856610 1
"H0" 2000 "Export"  50 "Bangladesh"          "BGD" 586 "Pakistan"              "PAK" "TOTAL"  50939355 1
"H1" 2000 "Export" 440 "Lithuania"           "LTU" 124 "Canada"                "CAN" "TOTAL"  11292295 1
"H1" 2001 "Export" 620 "Portugal"            "PRT" 600 "Paraguay"              "PRY" "TOTAL"   1808772 1
"H2" 2005 "Export" 100 "Bulgaria"            "BGR" 548 "Vanuatu"               "VUT" "TOTAL"      1235 1
"H2" 2005 "Export" 203 "Czechia"             "CZE"  20 "Andorra"               "AND" "TOTAL"   1484852 1
"H2" 2006 "Export" 203 "Czechia"             "CZE" 233 "Estonia"               "EST" "TOTAL" 146323000 1
"H2" 2006 "Export" 780 "Trinidad and Tobago" "TTO" 704 "Viet Nam"              "VNM" "TOTAL"      3993 1
"H2" 2007 "Export" 398 "Kazakhstan"          "KAZ" 764 "Thailand"              "THA" "TOTAL"  16296484 1
"H3" 2011 "Export" 208 "Denmark"             "DNK" 480 "Mauritius"             "MUS" "TOTAL"  18131092 1
"H3" 2011 "Export" 528 "Netherlands"         "NLD"  44 "Bahamas"               "BHS" "TOTAL"  72062330 1
"H4" 2014 "Export" 218 "Ecuador"             "ECU" 348 "Hungary"               "HUN" "TOTAL"    924509 1
"H4" 2015 "Export"  76 "Brazil"              "BRA" 112 "Belarus"               "BLR" "TOTAL"  11970035 1
"H4" 2015 "Export" 124 "Canada"              "CAN" 796 "Turks and Caicos Isds" "TCA" "TOTAL"   3174642 1
"H4" 2017 "Export" 466 "Mali"                "MLI" 579 "Norway"                "NOR" "TOTAL"    111421 1
"H5" 2018 "Export" 400 "Jordan"              "JOR" 724 "Spain"                 "ESP" "TOTAL"  21359403 1
"H5" 2020 "Export" 422 "Lebanon"             "LBN"  48 "Bahrain"               "BHR" "TOTAL"  14848164 1
end

For clean exports

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 classification int year str9 tradeflow int reportercode str32 reporter str3 reporteriso int partnercode str44 partner str3 partneriso long commoditycode double tradevalueus byte pick
"H1" 2000 "Export" 842 "USA"                  "USA"   0 "World"                "WLD" 560314  90360763 1
"H1" 1999 "Export" 372 "Ireland"              "IRL"   0 "World"                "WLD" 841320     61585 1
"H2" 2006 "Export" 156 "China"                "CHN"  40 "Austria"              "AUT" 841360   1653852 1
"H2" 2006 "Export" 686 "Senegal"              "SEN"   0 "World"                "WLD" 841350     24893 1
"H4" 2016 "Export" 381 "Italy"                "ITA"   0 "World"                "WLD" 841459 623887450 1
"H4" 2017 "Export" 662 "Saint Lucia"          "LCA" 392 "Japan"                "JPN" 841440      2146 1
"H5" 2018 "Export" 699 "India"                "IND" 690 "Seychelles"           "SYC" 840991      6668 1
"H0" 1995 "Export" 381 "Italy"                "ITA" 246 "Finland"              "FIN" 842833     70619 1
"H3" 2009 "Export" 752 "Sweden"               "SWE" 470 "Malta"                "MLT" 847410      1314 1
"H3" 2010 "Export" 604 "Peru"                 "PER" 124 "Canada"               "CAN" 842139       132 1
"H3" 2010 "Export" 710 "South Africa"         "ZAF" 837 "Bunkers"              ""    846291       297 1
"H4" 2013 "Export" 440 "Lithuania"            "LTU"  40 "Austria"              "AUT" 842139     37307 1
"H5" 2017 "Export" 826 "United Kingdom"       "GBR" 634 "Qatar"                "QAT" 842139   2286580 1
"H0" 1992 "Export" 300 "Greece"               "GRC" 899 "Areas, nes"           ""    901580       212 1
"H2" 2002 "Export" 764 "Thailand"             "THA" 508 "Mozambique"           "MOZ" 847989     11414 1
"H4" 2016 "Export" 276 "Germany"              "DEU" 288 "Ghana"                "GHA" 848110     53832 1
"H0" 2012 "Export" 757 "Switzerland"          "CHE" 764 "Thailand"             "THA" 902290    116863 1
"H0" 2013 "Export" 124 "Canada"               "CAN" 704 "Viet Nam"             "VNM" 902890     60988 1
"H0" 2013 "Export" 784 "United Arab Emirates" "ARE"  48 "Bahrain"              "BHR" 902690     23433 1
"H1" 2012 "Export" 251 "France"               "FRA" 140 "Central African Rep." "CAF" 903289      3666 1
end

And for the Dirty exports

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 classification int year str6 tradeflow int reportercode str32 reporter str3 reporteriso int partnercode str44 partner str3 partneriso long commoditycode double tradevalueus byte pick
"S1" 1989 "Export"  76 "Brazil"             "BRA" 528 "Netherlands"                      "NLD" 684  355550336 1
"S1" 1989 "Export" 826 "United Kingdom"     "GBR" 388 "Jamaica"                          "JAM" 514     146319 1
"S1" 1992 "Export" 381 "Italy"              "ITA" 508 "Mozambique"                       "MOZ" 672      14844 1
"S1" 1995 "Export" 124 "Canada"             "CAN"   0 "World"                            "WLD" 682 1719619455 1
"S1" 1999 "Export"  56 "Belgium"            "BEL"   0 "World"                            "WLD" 685   83844289 1
"S1" 2000 "Export" 724 "Spain"              "ESP" 414 "Kuwait"                           "KWT" 661     803244 1
"S1" 2003 "Export" 442 "Luxembourg"         "LUX" 540 "New Caledonia"                    "NCL" 581      31460 1
"S1" 2004 "Export" 381 "Italy"              "ITA" 826 "United Kingdom"                   "GBR" 672   35209339 1
"S1" 2006 "Export" 428 "Latvia"             "LVA" 292 "Gibraltar"                        "GIB" 581         73 1
"S1" 2006 "Export" 702 "Singapore"          "SGP" 344 "China, Hong Kong SAR"             "HKG" 674   19352021 1
"S1" 2007 "Export" 757 "Switzerland"        "CHE" 703 "Slovakia"                         "SVK" 685      34821 1
"S1" 2008 "Export" 498 "Rep. of Moldova"    "MDA" 191 "Croatia"                          "HRV" 673     641575 1
"S1" 2010 "Export" 710 "South Africa"       "ZAF" 300 "Greece"                           "GRC" 673       2214 1
"S1" 2011 "Export" 858 "Uruguay"            "URY"  68 "Bolivia (Plurinational State of)" "BOL" 514     796164 1
"S1" 2013 "Export" 579 "Norway"             "NOR" 792 "Turkey"                           "TUR" 678     260160 1
"S1" 2015 "Export" 422 "Lebanon"            "LBN" 288 "Ghana"                            "GHA" 682      32348 1
"S1" 2015 "Export" 608 "Philippines"        "PHL" 616 "Poland"                           "POL" 513          1 1
"S1" 2015 "Export" 702 "Singapore"          "SGP" 699 "India"                            "IND" 661     766993 1
"S1" 2016 "Export" 643 "Russian Federation" "RUS" 764 "Thailand"                         "THA" 677       1041 1
"S1" 2019 "Export" 116 "Cambodia"           "KHM" 724 "Spain"                            "ESP" 581    3247584 1
end

I used the randomtag to generate these. After using the codes I initially posted, I merged using merge 1:1_n and got

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str2 classification int year str9 tradeflow int reportercode str32 reporter str3 reporteriso int partnercode str44 partner str3 partneriso str6 commoditycode double tradevalueus float(totalvalues cleanvalues dirtyvalues)
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 156 "China"                "CHN" "TOTAL"   1942528 1080131200 2321182  3581272
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 748 "Eswatini"             "SWZ" "TOTAL"   1723860 1080131200 2321182  4750800
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 804 "Ukraine"              "UKR" "TOTAL"     34200 1080131200 4086257  6204000
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 246 "Finland"              "FIN" "TOTAL"   6167514 1080131200 4086257  3915048
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 410 "Rep. of Korea"        "KOR" "TOTAL"     40000 1080131200 4086257  3307930
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 398 "Kazakhstan"           "KAZ" "TOTAL"    338836 1080131200 7092384  3307930
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 616 "Poland"               "POL" "TOTAL"     49559 1080131200 7092384   186827
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 818 "Egypt"                "EGY" "TOTAL"     84000 1080131200 7092384   133848
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 760 "Syria"                "SYR" "TOTAL"   2388075 1080131200 7092384   133848
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 826 "United Kingdom"       "GBR" "TOTAL"   2518025 1080131200 7092384   589973
"H2" 2008 "Export" 4 "Afghanistan" "AFG"  36 "Australia"            "AUS" "TOTAL"     10692 1080131200     145   589973
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 699 "India"                "IND" "TOTAL" 132001893 1080131200     841   589973
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 392 "Japan"                "JPN" "TOTAL"    333700 1080131200     508   589973
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 364 "Iran"                 "IRN" "TOTAL"  18203091 1080131200     594   128800
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 795 "Turkmenistan"         "TKM" "TOTAL"  10831000 1080131200   54029   128800
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 762 "Tajikistan"           "TJK" "TOTAL"   6313355 1080131200   54029   128800
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 251 "France"               "FRA" "TOTAL"    126076 1080131200  142389   128800
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 417 "Kyrgyzstan"           "KGZ" "TOTAL"     30000 1080131200  787093      106
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 842 "USA"                  "USA" "TOTAL"   2131740 1080131200 1841945      434
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 422 "Lebanon"              "LBN" "TOTAL"    860000 1080131200  461319      533
"H2" 2008 "Export" 4 "Afghanistan" "AFG"   0 "World"                "WLD" "TOTAL" 540065594 1080131200  461319   637565
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 344 "China, Hong Kong SAR" "HKG" "TOTAL"     21884 1080131200    9167   186827
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 724 "Spain"                "ESP" "TOTAL"   1837318 1080131200   28287   121612
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 792 "Turkey"               "TUR" "TOTAL"  17886832 1080131200   50438   119866
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 368 "Iraq"                 "IRQ" "TOTAL"   5914575 1080131200   67750   119866
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 784 "United Arab Emirates" "ARE" "TOTAL"  18769436 1080131200  164653     8284
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 643 "Russian Federation"   "RUS" "TOTAL"  36721775 1080131200     228     8284
"H2" 2008 "Export" 4 "Afghanistan" "AFG"  56 "Belgium"              "BEL" "TOTAL"   2765806 1080131200    8888  2590000
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 276 "Germany"              "DEU" "TOTAL"    681919 1080131200   10392    10468
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 300 "Greece"               "GRC" "TOTAL"     48000 1080131200  108768     9827
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 586 "Pakistan"             "PAK" "TOTAL" 264316383 1080131200  108768     2877
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 682 "Saudi Arabia"         "SAU" "TOTAL"   2198097 1080131200  481703      920
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 528 "Netherlands"          "NLD" "TOTAL"     51428 1080131200   97377      920
"H2" 2008 "Export" 4 "Afghanistan" "AFG" 860 "Uzbekistan"           "UZB" "TOTAL"   1516890 1080131200  155902       20
"H2" 2008 "Export" 4 "Afghanistan" "AFG"  76 "Brazil"               "BRA" "TOTAL"   1207107 1080131200   10528     5500
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 528 "Netherlands"          "NLD" "TOTAL"    329732  806881984   10528  3577272
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 251 "France"               "FRA" "TOTAL"    153640  806881984   10528  4750800
"H2" 2009 "Export" 4 "Afghanistan" "AFG"  76 "Brazil"               "BRA" "TOTAL"    305216  806881984 1468416  6204000
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 616 "Poland"               "POL" "TOTAL"     81000  806881984 1916135  3915048
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 842 "USA"                  "USA" "TOTAL"  17314844  806881984 5560321    28490
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 364 "Iran"                 "IRN" "TOTAL"  41276586  806881984      34     6401
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 860 "Uzbekistan"           "UZB" "TOTAL"    735326  806881984      79   107864
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 762 "Tajikistan"           "TJK" "TOTAL"   5893775  806881984     125    51875
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 703 "Slovakia"             "SVK" "TOTAL"     27000  806881984   17728   453878
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 748 "Eswatini"             "SWZ" "TOTAL"   2901311  806881984  452172     4000
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 124 "Canada"               "CAN" "TOTAL"    315234  806881984  452172      741
"H2" 2009 "Export" 4 "Afghanistan" "AFG"  56 "Belgium"              "BEL" "TOTAL"   1365903  806881984  452172     3286
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 368 "Iraq"                 "IRQ" "TOTAL"   6931153  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 804 "Ukraine"              "UKR" "TOTAL"    634835  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 276 "Germany"              "DEU" "TOTAL"    764633  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG"   0 "World"                "WLD" "TOTAL" 403441006  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 586 "Pakistan"             "PAK" "TOTAL" 191131851  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 784 "United Arab Emirates" "ARE" "TOTAL"   6979532  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 417 "Kyrgyzstan"           "KGZ" "TOTAL"     36000  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 682 "Saudi Arabia"         "SAU" "TOTAL"   2409891  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 344 "China, Hong Kong SAR" "HKG" "TOTAL"    251718  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG"  36 "Australia"            "AUS" "TOTAL"      8462  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 699 "India"                "IND" "TOTAL"  76008986  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 818 "Egypt"                "EGY" "TOTAL"    886367  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 826 "United Kingdom"       "GBR" "TOTAL"    293669  806881984  452172 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 795 "Turkmenistan"         "TKM" "TOTAL"   5796943  806881984  226969 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 760 "Syria"                "SYR" "TOTAL"   2465663  806881984  226969 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 792 "Turkey"               "TUR" "TOTAL"   6048036  806881984  226969 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 208 "Denmark"              "DNK" "TOTAL"    537997  806881984  226969 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 422 "Lebanon"              "LBN" "TOTAL"     25913  806881984  226969 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 381 "Italy"                "ITA" "TOTAL"    124275  806881984  226969 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 428 "Latvia"               "LVA" "TOTAL"     54000  806881984  226969 34126240
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 156 "China"                "CHN" "TOTAL"   3590181  806881984  226969 12553953
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 643 "Russian Federation"   "RUS" "TOTAL"  25663266  806881984  226969 12553953
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 392 "Japan"                "JPN" "TOTAL"    701155  806881984  226969 12553953
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 410 "Rep. of Korea"        "KOR" "TOTAL"   1290950  806881984  226969 12553953
"H2" 2009 "Export" 4 "Afghanistan" "AFG" 398 "Kazakhstan"           "KAZ" "TOTAL"    105963  806881984  226969 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 682 "Saudi Arabia"         "SAU" "TOTAL"   1566230  776967296  226969 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 757 "Switzerland"          "CHE" "TOTAL"    263628  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 762 "Tajikistan"           "TJK" "TOTAL"   7243814  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 251 "France"               "FRA" "TOTAL"    264195  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 398 "Kazakhstan"           "KAZ" "TOTAL"    941775  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 528 "Netherlands"          "NLD" "TOTAL"   9409321  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 784 "United Arab Emirates" "ARE" "TOTAL"   6180693  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 818 "Egypt"                "EGY" "TOTAL"   3832974  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 724 "Spain"                "ESP" "TOTAL"    127800  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 410 "Rep. of Korea"        "KOR" "TOTAL"    641836  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 792 "Turkey"               "TUR" "TOTAL"  34774715  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 826 "United Kingdom"       "GBR" "TOTAL"   2049947  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 381 "Italy"                "ITA" "TOTAL"    549969  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 368 "Iraq"                 "IRQ" "TOTAL"   8721384  776967296 1590428 12553953
"H2" 2010 "Export" 4 "Afghanistan" "AFG"  56 "Belgium"              "BEL" "TOTAL"    839983  776967296 1590428 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 344 "China, Hong Kong SAR" "HKG" "TOTAL"    182738  776967296 1590428 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG"  36 "Australia"            "AUS" "TOTAL"    648457  776967296 1590428 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 246 "Finland"              "FIN" "TOTAL"   2953092  776967296 1590428 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG"   0 "World"                "WLD" "TOTAL" 388483635  776967296 1590428 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 360 "Indonesia"            "IDN" "TOTAL"    127210  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 643 "Russian Federation"   "RUS" "TOTAL"  29585442  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 760 "Syria"                "SYR" "TOTAL"    941436  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 428 "Latvia"               "LVA" "TOTAL"    529410  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 364 "Iran"                 "IRN" "TOTAL"  31730468  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 616 "Poland"               "POL" "TOTAL"     37800  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 554 "New Zealand"          "NZL" "TOTAL"     37800  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 586 "Pakistan"             "PAK" "TOTAL" 151342943  776967296 2698803 27515648
"H2" 2010 "Export" 4 "Afghanistan" "AFG" 276 "Germany"              "DEU" "TOTAL"   1883784  776967296 2698803 27515648
end

I tried the option suggested by Wiliam Lisowski and got,

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(year reportercode) double(totalvalues cleanvalues dirtyvalues) float(cleanshare dirtyshare) byte pick
2012  20    211899286     3151252      1222351    .01487146  .005768547 1
2016  36 379259949125  4427645960 116841735052   .011674436    .3080782 1
2010  50  38461965117    56952370      5870011   .001480745 .0001526186 1
2007  72  10145046371     7307950   2243913595  .0007203466    .2211832 1
1994 144   6419324740    57334704     24465088   .008931579  .003811162 1
2018 208 217042898722 11961453952    349193596    .05511101  .001608869 1
1991 218  11405596582     1067496     15844126 .00009359406 .0013891537 1
2011 258    299635723     1232813       426689  .0041143727 .0014240257 1
1998 324   1125717388   134952369    818673280    .11988121    .7272458 1
2003 376  63565488000  2729846000   4526174000    .04294541   .07120489 1
2020 381 997607663826 62138442960 121793882510    .06228746   .12208595 1
1998 392 776272312630 46367111468  73248946356    .05973047   .09435986 1
1994 410 192015335478  3461778708  21817052544   .018028658    .1136214 1
2010 454   2132407332    32675542     21718521   .015323312  .010184977 1
2017 554  76100572286   809070199   2928142740    .01063159   .03847728 1
2019 600  15304070948    61953708    416835684   .004048185  .027236916 1
2005 704  64894258334   429913606   1527037546   .006624833  .023531165 1
2006 704  79652445604   498860714   2266700268   .006262968  .028457385 1
2016 740   2470566326     3506681      3989718  .0014193835 .0016149002 1
2014 768   1919791325    36440711    201392723     .0189816   .10490344 1
end

This works well. The mean values are almost what I calculated by hand. However,

1. It shrinks the data to just over 3,932 as against 595,726 I need to work with.
2. It drops other variables such as partnercodes which are important for the country-pair and importer-year fixed effects, which I intend to use for the regression.
3. The minimum values are taking on negative values.

Please is there a way I can keep the trading relationships in total exports, the relevant variables and still get the clean and dirty shares?

Many Thanks

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#5

13 Aug 2022, 21:52

-merge 1:1 _n- is definitely not appropriate here. It does have its uses, but this isn't one of them: here it just pairs up unrelated observations making a tasty data salad.

Since you are learning Stata, one thing to learn about -merge- is that when you get that "...do not uniquely identify observations..." error message, one of the following is true:

1. You don't understand your data and you are not using the correct set of variables for the -merge-. Typically the error is that you have omitted one or more variables that need to be part of the merge key.
2. There are duplicate observations in your data set that shouldn't be there. In this case, the problem lies in the data management that led up to the creation of that data set, which needs to be reviewed and fixed so as to remove the surplus observations and leave in their place a single correct observation.
3. You don't understand what you are trying to do in putting the data sets together and you should be either using -joinby- or, less likely, -append-, instead of -merge.

In your situation, I cannot tell if the problem is 1 or 2. That is because I cannot discern whether you are trying to calculate these share separately for each commodity (I don't think that's the case, but it isn't clear to me from the problem description in #1) or whether you want to do it just once each year for each pair of trading partners for all commodities combined. I'm going to assume you want the latter. So in this case the problem is that you are trying to merge the wrong data sets. These data sets have separate observations for commodities. So you need to do the -collapse- before you -merge- them. If you do that, you will get one observation per reportercode-partnercode-year combination:

Code:

use clean_data_set, clear collapse (sum) clean = tradevalueus, by(reportercode partnercode year) tempfile clean_totals save `clean_totals' use dirty_data_set, clear collapse (sum) dirty = tradevalueus, by(reportercode partnercode year) tempfile dirty_totals save `dirty_totals' use totals_data_set, clear collapse (sum) tradevalueus (first) reporter partneriso, by(reportercode partnercode year) merge 1:1 reportercode partnercode year using `clean_totals', gen(c_merge) merge 1:1 reportercode partnercode year using `dirty_totals', gen(d_merge) tab1 c_merge d_merge foreach v of varlist clean dirty { gen `v'_share = `v'/tradevalueus }

Replace the italicized portions with the actual names of the corresponding data sets.

Now, in the example data, there are no trading partners and years that match up across data sets, so no shares are actually calculated. But presumably in your real data set there will be matches and you will get the numbers you seek.

Last edited by Clyde Schechter; 13 Aug 2022, 21:56.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

14 Aug 2022, 05:21

I am going to back up from Clyde's advice in post #5 and instead focus on your comments in post #4 on the technique I outlined in post #2.

I tried the option suggested by Wiliam Lisowski and got,
...
This works well. The mean values are almost what I calculated by hand. However,

1. It shrinks the data to just over 3,932 as against 595,726 I need to work with.
2. It drops other variables such as partnercodes which are important for the country-pair and importer-year fixed effects, which I intend to use for the regression.
3. The minimum values are taking on negative values.

So since the results are apparently what you want, with some exceptions, let's address the three points you make to show you how these results can give you what you want.

With regard to point 3, if the minimum values are taking on negative values, it is because you have negative values in your data. If for example cleanshare is negative, then either cleanvalues or totalvalues must be negative, and since each is created as a sum of values in one of your datasets, there must be observations where the variable values is negative. You must become more familiar with your data and understand why that happens.

Code:

use dataset, clear browse if values<0

will open Stata's Data Browser window displaying those observations where the variable "values" is less than zero.

With regard to points 1 and 2, you can just merge the results back onto whichever of your your original datasets you need them to be part of. Expanding my example code in post #2

Code:

use clean, clear collapse (sum) cleanvalues=values, by(reportercode year) save sumclean, replace use dirty, clear collapse (sum) dirtyvalues=values, by(reportercode year) save sumdirty, replace use total, clear collapse (sum) totalvalues=values, by(reportercode year) merge 1:1 reportercode year using sumclean, generate(_merge_c) merge 1:1 reportercode year using sumdirty, generate(_merge_d) generate cleanshare = cleanvalues/totalvalues generate dirtyshare = dirtyvalues/totalvalues save shares, replace use total, clear merge m:1 reportercode year using shares save total_with_shares, replace

And I strongly suggest that you read the PDF documentation for the merge command linked to from the output of help merge so that you understand the use of merge 1:m and merge m:1 rather than focus on merge 1:1 which is not suitable for your problem. In the context of your data, you have for each combination of reportercode and year multiple observations with the the same value of reportercode and year but different values of partnercode to which you need to attach 1 observation for that reportercode and year with the shares: hence merge m:1.

Last edited by William Lisowski; 14 Aug 2022, 05:23.
Comment

Tony Ekere

Join Date: Aug 2022
Posts: 12

14 Aug 2022, 07:13

Thanks, over and again Clyde Schechter for everything. This is super helpful. The problem is seemingly Number 2. Yes, your assumption is correct. I seek to calculate the shares just once a year for each trading pair, for all commodity combination. Using the codes you provided, I get.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input int(year reportercode partnercode) double tradevalueus str32 reporter str3 partneriso double clean byte c_merge double dirty byte d_merge float(clean_share dirty_share) byte pick
2012 112 562     287100 "Belarus"              "NER"        . 1         . 1            .           . 1
2006 192 579      99344 "Cuba"                 "NOR"        . 1         . 1            .           . 1
2019 246  76  392272566 "Finland"              "BRA" 31578761 3  33086538 3    .08050209   .08434579 1
1997 251 112   64920708 "France"               "BLR"  3216625 3  10359675 3    .04954698   .15957427 1
1999 288 760     523148 "Ghana"                "SYR"        . 1         . 1            .           . 1
2008 300 175     403493 "Greece"               "MYT"        . 1         . 1            .           . 1
2005 360 184     522278 "Indonesia"            "COK"       31 3     27327 3 .00005935536   .05232271 1
2013 372 140     841119 "Ireland"              "CAF"        . 1     16997 3            .  .020207604 1
2012 372 226     658512 "Ireland"              "GNQ"        . 1       491 3            . .0007456204 1
2007 372 528 4847423353 "Ireland"              "NLD" 46841097 3 334406493 3   .009663092  .068986446 1
2018 388 780   18933343 "Jamaica"              "TTO"  1713019 3   1034274 3    .09047631   .05462712 1
2007 528 704  445800218 "Netherlands"          "VNM" 12852389 3  21302126 3    .02882993   .04778402 1
2018 608 694      61351 "Philippines"          "SLE"        . 1         . 1            .           . 1
2017 642  20     232614 "Romania"              "AND"   205753 3         . 1     .8845254           . 1
1988 699 332     128601 "India"                "HTI"    66127 3     14749 3    .51420283   .11468807 1
2015 702  76 1144664600 "Singapore"            "BRA" 46434215 3  87292295 3    .04056578   .07626015 1
2016 764 180    5900167 "Thailand"             "COD"     4097 3         . 1  .0006943871           . 1
2012 784 531    6116166 "United Arab Emirates" "CUW"    76604 3         . 1    .01252484           . 1
1996 862 480   25800000 "Venezuela"            "MUS"        . 1         . 1            .           . 1
2011 887 642      46615 "Yemen"                "ROU"        . 1         . 1            .           . 1
end
label values c_merge _merge
label values d_merge _merge
label def _merge 1 "master only (1)", modify
label def _merge 3 "matched (3)", modify

However, the mean value of dirty share and maximum values of both clean share and dirty share are not within expected bounds. (0-1) Please is there something i should do?

Thanks a lot.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30355
#8

14 Aug 2022, 10:17

However, the mean value of dirty share and maximum values of both clean share and dirty share are not within expected bounds. (0-1) Please is there something i should do?

The problem must arise from your data. You have gathered the values of clean, dirty, and total from separate data sets. Even assuming that you have made no errors in the way you imported and managed the data up to this point, it is possible that these different sources use different methodologies to collect their data and this results in the "dirty" or "clean" amounts being greater than the "total." Similarly, negative values in the shares can arise only as a result of negative values in the original data.

The advice that William Lisowski gives in #6 will help you do that. I emphasize that you should do this data trouble shooting at the commoditycode level, as there may be specific sectors that are problematic.

Last edited by Clyde Schechter; 14 Aug 2022, 10:20.
1 like
Comment
Tony Ekere

Join Date: Aug 2022

Posts: 12
#9

14 Aug 2022, 11:38

I cannot thank you both enough William Lisowski and Clyde Schechter. The last suggestion from William works perfectly. All the concerns have been addressed. The supposed negative values were simply exponentials tending towards 0. I have also gained more insight on merging.
Comment

Announcement