Conditional Split and Combine

Sreenivasa Prabhu

Join Date: Oct 2023
Posts: 8

Conditional Split and Combine

16 Oct 2023, 03:39

I have a dataset consisting of Author Names and Author ID from Scopus. The problem I am facing is that, each column had multiple entry. I am copying a section of my data below for clarification.

Authors	Author(s) ID
Abbott N., Krafft M.P.	57210774564;7006041188;
Abboud F., Damiano J.P., Papiernik A.	7102796864;16050119900;7003385210;
Abboud F., Damiano J.P., Papiernik A.	7102796864;16050119900;7003385210;
Abboud F., Damiano J.P., Papiernik A.	7102796864;16050119900;7003385210;
Abboud F., Damiano J.P., Papiernik A.	7102796864;16050119900;7003385210;

In the current dataset the entries are as follows. (A1, A2, A3, .....) (B1, B2, B3, .....). I want to split this data such that I have Individual Author and their Ids and separated i.e. (A1, B1), A2, B2) ......

P.S: I am relatively new to Stata and have never tried anything similar to this.

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10270

16 Oct 2023, 04:41

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str37 Authors str34 authorsid
"Abbott N., Krafft M.P."                "57210774564;7006041188;"          
"Abboud F., Damiano J.P., Papiernik A." "7102796864;16050119900;7003385210;"
"Abboud F., Damiano J.P., Papiernik A." "7102796864;16050119900;7003385210;"
"Abboud F., Damiano J.P., Papiernik A." "7102796864;16050119900;7003385210;"
"Abboud F., Damiano J.P., Papiernik A." "7102796864;16050119900;7003385210;"
end

contract Authors authorsid, nomiss freq(count)
replace count= length(authorsid)- length(subinstr(authorsid, ";","", .))
expand count
bys Authors: gen name = word(subinstr(subinstr(trim(itrim(Authors)), " ", "*", .), ",", " ", .), _n)
replace name= trim(itrim(subinstr(name, "*", " ", .)))
by Authors: gen id = word(subinstr(trim(itrim(authorsid)), ";", " ", .), _n)
keep name id

Res.:

Code:

. l

     +----------------------------+
     |         name            id |
     |----------------------------|
  1. |    Abbott N.   57210774564 |
  2. |  Krafft M.P.    7006041188 |
  3. |    Abboud F.    7102796864 |
  4. | Damiano J.P.   16050119900 |
  5. | Papiernik A.    7003385210 |
     +----------------------------+

Last edited by Andrew Musau; 16 Oct 2023, 04:44.

Comment

Sreenivasa Prabhu

Join Date: Oct 2023

Posts: 8
#3

16 Oct 2023, 05:15

Thanks Andrew Musau . I tried to run the code. However, I have a large dataset and when I run the code, I still get teh results for only 5 columns as dispalyed in your response. How do I extend my analysis to the complete file?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10270
#4

16 Oct 2023, 05:29

I do not follow your response. Use dataex to provide a larger example that illustrates the difficulties in extending #2.
Comment
Sreenivasa Prabhu

Join Date: Oct 2023

Posts: 8
#5

16 Oct 2023, 05:31

Andrew Musau I figured the first part, However, it shows error (pasted below) while running the code.

contract Authors authorsid, nomiss freq(count)
'contract' cannot be read as a number
1. replace count= length(authorsid)- length(subinstr(authorsid, ";","", .))
'replace' cannot be read as a number
1. expand count
'expand' cannot be read as a number
1. bys Authors: gen name = word(subinstr(subinstr(trim(itrim(Authors)), " ", "*", .), ",", " ", .)
> , _n)
'bys' cannot be read as a number
1. replace name= trim(itrim(subinstr(name, "*", " ", .)))
'replace' cannot be read as a number
1. by Authors: gen id = word(subinstr(trim(itrim(authorsid)), ";", " ", .), _n)
'by' cannot be read as a number
1. keep name id
'keep' cannot be read as a number
1.
unexpected end of file
r(612);
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10270
#6

16 Oct 2023, 08:24

#2 assumes that the data is loaded into Stata and then you just run

contract Authors authorsid, nomiss freq(count)
replace count= length(authorsid)- length(subinstr(authorsid, ";","", .))
expand count
bys Authors: gen name = word(subinstr(subinstr(trim(itrim(Authors)), " ", "*", .), ",", " ", .), _n)
replace name= trim(itrim(subinstr(name, "*", " ", .)))
by Authors: gen id = word(subinstr(trim(itrim(authorsid)), ";", " ", .), _n)
keep name id

There should be no inputting of data as implied by #5. Get a colleague to help you load the data and run the code in #2 if you have issues. Alternatively, load the dataset and copy and paste the output of

Code:

dataex in 1/20

if you cannot obtain results with #2.
1 like
Comment
Sreenivasa Prabhu

Join Date: Oct 2023

Posts: 8
#7

16 Oct 2023, 12:28

Thank you Andrew
Comment

Announcement

Conditional Split and Combine

Comment

Comment

Comment

Comment

Comment

Comment