I have a most annoying dataset. For those with Python who'd like to following along with me, here's how we collate all this together. Note, that I use gcollapse here from gtools, but use the normal one if you'd wish. Also, fair warning, this'll take a few minutes to run, so if you're short on time, simply refer the below dataset.
Here is the resultant dataset.
Okay now to the point of the post: I hate this dataset. Whoever put it together should... know better than to have such long strings. Here's my question: let's say for categories with semicolons, I want to keep everything before the semicolon, such that "Arms and ammunition; parts and accessories thereof" is simply "Arms and ammunition"?
Code:
cls
python:
import json
import numpy as np
import pandas as pd
import requests, time
def Comtrade_Scraper (ps: int,
type: str= 'C',
freq: str= 'A',
px : str= 'S2',
r : str= 'all',
p : int= 156,
rg : int= 2,
cc : str= 'AG2'):
"""
Wrapper for creating URLs to access the Comtrade API
ARGUMENTS
*********
Required
ps = year
"""
base = 'https://comtrade.un.org/api/get?max=100000'
url = f'https://comtrade.un.org/api/get/plus?max=100000&type=C&freq=M&px=HS&ps={ps}&r=all&p=156&rg=2&cc=AG2'
result = requests.get(url).json()
if 'dataset' in result:
df = pd.DataFrame(result['dataset'])
df = df.replace({None: np.nan})
df.columns= [i[:32] for i in df.columns]
df = df.reset_index(drop=True)
df.to_stata(f'ChinaDonors_{ps}.dta')
for i in range(2010,2020): Comtrade_Scraper(i)
end
clear *
u "ChinaDonors_2014.dta" , clear
cls
local filelist : dir . files "ChinaDonors*"
qui foreach file of local filelist {
ap using `file'
}
qbys cmdCode period: egen totmoney = max( TradeValue )
replace cmdDescE = subinstr(cmdDescE, "n.e.s.", "n.e.c.", .)
replace cmdDescE = "Man-made filaments" if cmdDescE== "Man-made filaments; strip and the like of man-made textile materials"
replace cmdDescE = "Soap" if strpos(cmdDescE, "Soap") >0
keep period cmdDescE totmoney cmdCode
destring cmdCode, replace
gcollapse (mean) totmoney, by(period cmdDescE cmdCode)
//labmask cmdCode, values(cmdDescE)
sort cmdCode period
destring cmdCode, replace
cls
order cmdCode period, first
tempvar periodc
g `periodc' = substr(string(period), 1, 4) + "/" + substr(string(period), 5, 6)
g date = monthly(`periodc', "YM")
format date %tmCCYYmon
cls
cap isid cmdCode date
if _rc {
duplicates tag cmdCode date, generate(dup)
br if dup ==1
}
xtset cmdCode date, m
br
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str235 cmdDescE float(totmoney date) "Animals; live" 9476400 696 "Meat and edible meat offal" 198382752 696 "Fish and crustaceans, molluscs and other aquatic invertebrates" 87912744 696 "Dairy produce; birds' eggs; natural honey; edible products of animal origin, not elsewhere specified or included" 210890080 696 "Animal originated products; not elsewhere specified or included" 22846126 696 "Trees and other plants, live; bulbs, roots and the like; cut flowers and ornamental foliage" 6565384 696 "Vegetables and certain roots and tubers; edible" 143490928 696 "Fruit and nuts, edible; peel of citrus fruit or melons" 1969781632 696 "Coffee, tea, mate and spices" 11540333 696 "Cereals" 124759368 696 "Products of the milling industry; malt, starches, inulin, wheat gluten" 79111432 696 "Oil seeds and oleaginous fruits; miscellaneous grains, seeds and fruit, industrial or medicinal plants; straw and fodder" 1281107328 696 "Lac; gums, resins and other vegetable saps and extracts" 4594716 696 "Vegetable plaiting materials; vegetable products not elsewhere specified or included" 2038850 696 "Animal or vegetable fats and oils and their cleavage products; prepared animal fats; animal or vegetable waxes" 227416064 696 "Meat, fish or crustaceans, molluscs or other aquatic invertebrates; preparations thereof" 7755954 696 "Sugars and sugar confectionery" 107115872 696 "Cocoa and cocoa preparations" 10298582 696 "Preparations of cereals, flour, starch or milk; pastrycooks' products" 130069840 696 "Preparations of vegetables, fruit, nuts or other parts of plants" 9538444 696 "Miscellaneous edible preparations" 81329184 696 "Beverages, spirits and vinegar" 81844600 696 "Food industries, residues and wastes thereof; prepared animal fodder" 31610020 696 "Tobacco and manufactured tobacco substitutes" 50521528 696 "Salt; sulphur; earths, stone; plastering materials, lime and cement" 96922680 696 "Ores, slag and ash" 3552674048 696 "Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes" 3656221952 696 "Inorganic chemicals; organic and inorganic compounds of precious metals; of rare earth metals, of radio-active elements and of isotopes" 217296544 696 "Organic chemicals" 1107920896 696 "Pharmaceutical products" 271897632 696 "Fertilizers" 72103984 696 "Tanning or dyeing extracts; tannins and their derivatives; dyes, pigments and other colouring matter; paints, varnishes; putty, other mastics; inks" 87265616 696 "Essential oils and resinoids; perfumery, cosmetic or toilet preparations" 198004288 696 "Soap" 94405440 696 "Albuminoidal substances; modified starches; glues; enzymes" 35387768 696 "Explosives; pyrotechnic products; matches; pyrophoric alloys; certain combustible preparations" 4671877 696 "Photographic or cinematographic goods" 69649664 696 "Chemical products n.e.c." 204346896 696 "Plastics and articles thereof" 907491584 696 "Rubber and articles thereof" 369909632 696 "Raw hides and skins (other than furskins) and leather" 79122688 696 "Articles of leather; saddlery and harness; travel goods, handbags and similar containers; articles of animal gut (other than silk-worm gut)" 93760448 696 "Furskins and artificial fur; manufactures thereof" 3814598 696 "Wood and articles of wood; wood charcoal" 259517760 696 "Cork and articles of cork" 2206475 696 "Manufactures of straw, esparto or other plaiting materials; basketware and wickerwork" 620832 696 "Pulp of wood or other fibrous cellulosic material; recovered (waste and scrap) paper or paperboard" 536855392 696 "Paper and paperboard; articles of paper pulp, of paper or paperboard" 58181424 696 "Printed books, newspapers, pictures and other products of the printing industry; manuscripts, typescripts and plans" 20417110 696 "Silk" 1949118 696 "Wool, fine or coarse animal hair; horsehair yarn and woven fabric" 109682520 696 "Cotton" 177132656 696 "Vegetable textile fibres; paper yarn and woven fabrics of paper yarn" 15832954 696 "Man-made filaments" 49821440 696 "Man-made staple fibres" 29650326 696 "Wadding, felt and nonwovens, special yarns; twine, cordage, ropes and cables and articles thereof" 24313090 696 "Carpets and other textile floor coverings" 1249502 696 "Fabrics; special woven fabrics, tufted textile fabrics, lace, tapestries, trimmings, embroidery" 18394404 696 "Textile fabrics; impregnated, coated, covered or laminated; textile articles of a kind suitable for industrial use" 20832144 696 "Fabrics; knitted or crocheted" 49430844 696 "Apparel and clothing accessories; knitted or crocheted" 57120456 696 "Apparel and clothing accessories; not knitted or crocheted" 59739144 696 "Textiles, made up articles; sets; worn clothing and worn textile articles; rags" 6573561 696 "Footwear; gaiters and the like; parts of such articles" 136364896 696 "Headgear and parts thereof" 1107134 696 "Umbrellas, sun umbrellas, walking-sticks, seat sticks, whips, riding crops; and parts thereof" 156185 696 "Feathers and down, prepared; and articles made of feather or of down; artificial flowers; articles of human hair" 14313544 696 "Stone, plaster, cement, asbestos, mica or similar materials; articles thereof" 55790776 696 "Ceramic products" 21081408 696 "Glass and glassware" 231207904 696 "Natural, cultured pearls; precious, semi-precious stones; precious metals, metals clad with precious metal, and articles thereof; imitation jewellery; coin" 2948837376 696 "Iron and steel" 403142560 696 "Iron or steel articles" 170073808 696 "Copper and articles thereof" 2123618176 696 "Nickel and articles thereof" 24740248 696 "Aluminium and articles thereof" 108409400 696 "Lead and articles thereof" 3095766 696 "Zinc and articles thereof" 51594688 696 "Tin; articles thereof" 4590268 696 "Metals; n.e.c., cermets and articles thereof" 20873236 696 "Tools, implements, cutlery, spoons and forks, of base metal; parts thereof, of base metal" 62317196 696 "Metal; miscellaneous products of base metal" 59797912 696 "Nuclear reactors, boilers, machinery and mechanical appliances; parts thereof" 3447730432 696 "Electrical machinery and equipment and parts thereof; sound recorders and reproducers; television image and sound recorders and reproducers, parts and accessories of such articles" 16160323584 696 "Railway, tramway locomotives, rolling-stock and parts thereof; railway or tramway track fixtures and fittings and parts thereof; mechanical (including electro-mechanical) traffic signalling equipment of all kinds" 24711610 696 "Vehicles; other than railway or tramway rolling stock, and parts and accessories thereof" 2308939264 696 "Aircraft, spacecraft and parts thereof" 721937600 696 "Ships, boats and floating structures" 44207672 696 "Optical, photographic, cinematographic, measuring, checking, medical or surgical instruments and apparatus; parts and accessories" 1178954752 696 "Clocks and watches and parts thereof" 183911616 696 "Musical instruments; parts and accessories of such articles" 11669655 696 "Arms and ammunition; parts and accessories thereof" 811758 696 "Furniture; bedding, mattresses, mattress supports, cushions and similar stuffed furnishings; lamps and lighting fittings, n.e.c.; illuminated signs, illuminated name-plates and the like; prefabricated buildings" 42320016 696 "Toys, games and sports requisites; parts and accessories thereof" 114710408 696 "Miscellaneous manufactured articles" 134703520 696 "Works of art; collectors' pieces and antiques" 2576926 696 "Commodities not specified according to kind" 832717696 696 end format %tmCCYYmon date br

Comment