I do have a lots of bibliographic information from Crossref that I would like to transform into Stata. I am struggling at a lot of corners - and hope that someone has a good trick to go ahead.
In the following code line I place an example of the data I am dealing with:
If I inspect the data with -insheetjson- (from ssc) showresponse, there is no problem with the data - but I struggle to get insheetjson to write the data into Stata. I created a workaround with the log file:
But it restricts the field size to 255 which is leading to quite few errors.
I tried -jsonio- (from ssc) but I get strange java errors I cannot deal with.
So my next try is to use - moss- (from ssc). It works like a charm. see
But I do struggle with the -author- field which is this part:
My try to solve it is this:
But I do not get any result extracted. Has anyone an idea how to create a regular expression to solve this?
I read the FAQ https://www.stata.com/support/faqs/d...ons/index.html but struggle to apply it to the given string...
In the following code line I place an example of the data I am dealing with:
Code:
{"subtitle":[],"subject":["Histology"],"issued":{"date-parts":[[2006]]},"score":1,"prefix":"http://id.crossref.org/prefix/10.1679","author":[{"family":"Katoh","given":"Yoshimitsu Y"},{"family":"Yamazaki","given":"Eriko"},{"family":"Taniguti","given":"Kanako"},{"family":"Yamada","given":"Keiki"},{"family":"Isomura","given":"Genzoh"}],"container-title":["Archives of Histology and Cytology","Arch. Histol. Cytol."],"reference-count":20,"page":"129-134","deposited":{"date-parts":[[2007,2,13]],"timestamp":1171324800000},"issue":"2","title":["Light and electron microscopic observation of intracytoplasmic inclusion bodies in the locus coeruleus of the hamster"],"type":"journal-article","DOI":"10.1679/aohc.69.129","ISSN":["0914-9465","1349-1717"],"URL":"http://dx.doi.org/10.1679/aohc.69.129","source":"CrossRef","publisher":"International Society of Histology & Cytology","indexed":{"date-parts":[[2014,5,12]],"timestamp":1399876237366},"volume":"69","member":"http://id.crossref.org/member/683"}
Code:
set linesize 255 qui log using data\jsonlog.txt, t name(json_parsed) replace insheetjson using "data\json-parsed.txt", flatten showresponse qui log close kmw_parsed infix str line 1-255 using "data\jsonlog.txt", clear drop if _n==_N drop in 1/3
I tried -jsonio- (from ssc) but I get strange java errors I cannot deal with.
So my next try is to use - moss- (from ssc). It works like a charm. see
Code:
moss line, match(`"source":"([A-Za-z]+)""') regex unicode prefix(source) moss line, match(`"issued":\{"date-parts":\[\[([0-9]+)\]\]\}"') regex unicode prefix(year) moss line, match(`"subject":\["([a-zA-Z,()"]+)"\]"') regex unicode prefix(subject)
Code:
"author":[{"family":"Katoh","given":"Yoshimitsu Y"},{"family":"Yamazaki","given":"Eriko"},{"family":"Taniguti","given":"Kanako"},{"family":"Yamada","given":"Keiki"},{"family":"Isomura","given":"Genzoh"}]
Code:
moss line, match(`"author":\[\{"([a-zA-Z",:\}]+)\]\}"') regex unicode prefix(author)
I read the FAQ https://www.stata.com/support/faqs/d...ons/index.html but struggle to apply it to the given string...
Comment