convert concatenate strings into numeric

Chen Samulsion

Join Date: Jan 2018

Posts: 921
#1

convert concatenate strings into numeric

14 Oct 2019, 06:21

Dear Stata users,

I have a data like below, the researchers input variables as alphabet. Now I want to convert those strings into numeric such that "A" as "1", "B" as "2", "C" as "3". It is easy to do when string has only one alphabet, but in cases that strings was concatenated as "A,B,C", how can I address it? Thank you in advance for advice.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str20 x1 str22 x2 str18 x3 "A" "A" "C" "A,B" "A,C" "B,D" "B,C" "C,G" "A" "A,B,C" "C" "B,C,D,E" "B" "B" "B" "C" "C" "A" "A,B" "A,C" "B,D" "B" "E" "E" "A" "B,C,D,F" "A" "A,B" "B" "A,B,C,E" "B" "A,F,G" "B,C,E" end
Tags: string
Jorrit Gosens

Join Date: Jan 2015

Posts: 1019
#2

14 Oct 2019, 07:13

What would the desired end result look like?
e.g., "A,B" should be turned into "1,2"? Or 2 separate numeric variables?
Cam you give explicit examples of what you'd want to have in the end for a few observations?
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

14 Oct 2019, 07:44

This may help. I wonder what you want to do about duplicates, but you say nothing about that, so no suggestions here.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str20 x1 str22 x2 str18 x3
"A"     "A"       "C"      
"A,B"   "A,C"     "B,D"    
"B,C"   "C,G"     "A"      
"A,B,C" "C"       "B,C,D,E"
"B"     "B"       "B"      
"C"     "C"       "A"      
"A,B"   "A,C"     "B,D"    
"B"     "E"       "E"      
"A"     "B,C,D,F" "A"      
"A,B"   "B"       "A,B,C,E"
"B"     "A,F,G"   "B,C,E"  
end

tokenize `c(ALPHA)' 
forval x = 1/26 { 
    foreach v in x1 x2 x3 { 
        replace `v' = subinstr(`v', "``x''", "`x'", .) 
    }
} 

egen X = concat(x?) , p(,) 

list 

     +---------------------------------------------+
     |    x1        x2        x3                 X |
     |---------------------------------------------|
  1. |     1         1         3             1,1,3 |
  2. |   1,2       1,3       2,4       1,2,1,3,2,4 |
  3. |   2,3       3,7         1         2,3,3,7,1 |
  4. | 1,2,3         3   2,3,4,5   1,2,3,3,2,3,4,5 |
  5. |     2         2         2             2,2,2 |
  6. |     3         3         1             3,3,1 |
  7. |   1,2       1,3       2,4       1,2,1,3,2,4 |
  8. |     2         5         5             2,5,5 |
  9. |     1   2,3,4,6         1       1,2,3,4,6,1 |
 10. |   1,2         2   1,2,3,5     1,2,2,1,2,3,5 |
 11. |     2     1,6,7     2,3,5     2,1,6,7,2,3,5 |
     +---------------------------------------------+

Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

14 Oct 2019, 08:58

Here is a cleaned-up concatenate any way (no duplicates, tidy order):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str20 x1 str22 x2 str18 x3
"A"     "A"       "C"      
"A,B"   "A,C"     "B,D"    
"B,C"   "C,G"     "A"      
"A,B,C" "C"       "B,C,D,E"
"B"     "B"       "B"      
"C"     "C"       "A"      
"A,B"   "A,C"     "B,D"    
"B"     "E"       "E"      
"A"     "B,C,D,F" "A"      
"A,B"   "B"       "A,B,C,E"
"B"     "A,F,G"   "B,C,E"  
end

tokenize `c(ALPHA)' 

gen wanted = "" 

quietly forval x = 1/26 { 
    replace wanted = cond(wanted == "", "`x'", wanted + ",`x'") if strpos(x1, "``x''") | strpos(x2, "``x''") | strpos(x3, "``x''") 
}

list , sep(0) 
    
     +-----------------------------------------+
     |    x1        x2        x3        wanted |
     |-----------------------------------------|
  1. |     A         A         C           1,3 |
  2. |   A,B       A,C       B,D       1,2,3,4 |
  3. |   B,C       C,G         A       1,2,3,7 |
  4. | A,B,C         C   B,C,D,E     1,2,3,4,5 |
  5. |     B         B         B             2 |
  6. |     C         C         A           1,3 |
  7. |   A,B       A,C       B,D       1,2,3,4 |
  8. |     B         E         E           2,5 |
  9. |     A   B,C,D,F         A     1,2,3,4,6 |
 10. |   A,B         B   A,B,C,E       1,2,3,5 |
 11. |     B     A,F,G     B,C,E   1,2,3,5,6,7 |
     +-----------------------------------------+

Comment

Chen Samulsion

Join Date: Jan 2018

Posts: 921
#5

15 Oct 2019, 00:38

Dear Jorrit Gosens and Nick Cox, thank you very much. Nick, I'm sorry for not clarifying my query, I just want to replace string to numeric variable by variable. Your answer in #3 is just enough to meet my problem! However I'm glad to see the further step using concat function that you provided, I always learn much from you.

Code:

tokenize `c(ALPHA)' forval x = 1/26 { foreach v in x1 x2 x3 { replace `v' = subinstr(`v', "``x''", "`x'", .) } }
Comment

Olayiwola Adetutu

Join Date: Sep 2019
Posts: 59

15 Oct 2019, 01:12

I am using Stata 16 SE, Please, I want to fit one and two parameter logistic models using bayesmh, to the sample of the data below :

HTML Code:

  	 		 			Q1 			Q2 			Q3 			Q4 			Q5 			Q6 			Q7 			Q8 			Q9 			Q10 		 		 			0 			0 			1 			1 			1 			1 			1 			1 			1 			1 		 		 			1 			1 			1 			1 			1 			1 			1 			0 			1 			1 		 		 			0 			0 			1 			1 			1 			1 			0 			1 			1 			1 		 		 			1 			1 			1 			0 			1 			0 			0 			0 			1 			1 		 		 			1 			0 			1 			1 			1 			1 			0 			0 			0 			0 		 		 			0 			1 			1 			1 			1 			1 			1 			0 			0 			0 		 		 			1 			1 			1 			1 			1 			0 			1 			0 			0 			1 		 		 			1 			0 			1 			0 			1 			1 			0 			0 			0 			1 		 		 			1 			1 			1 			0 			1 			0 			1 			0 			1 			0 		 		 			0 			1 			1 			1 			1 			1 			0 			1 			0 			1 		 		 			1 			0 			1 			1 			1 			0 			0 			1 			0 			0 		 		 			0 			0 			1 			0 			1 			1 			1 			0 			1 			0 		 		 			1 			0 			1 			0 			1 			1 			0 			0 			0 			0 		 		 			0 			0 			1 			1 			1 			1 			1 			1 			1 			1 		 		 			0 			1 			1 			1 			1 			0 			0 			0 			1 			1 		 		 			0 			1 			1 			1 			1 			1 			0 			1 			1 			1 		 		 			0 			1 			1 			1 			1 			1 			0 			0 			1 			1 		 		 			1 			0 			1 			1 			1 			0 			0 			0 			0 			0 		 		 			0 			0 			1 			1 			1 			1 			0 			0 			1 			0

Th original data made up of 35 questions answered by 403 examinees using this codes:

Code:

 
.set maxvar 30000

. set emptycells drop

. import excel "C:\Users\MATTHEW ADETUTU\Documents\Result_Coding.xlsx", sheet("Sheet 1") firstrow
(35 vars, 403 obs)

. generate id = _n

. 
. quietly reshape long Q, i(id) j(item)

. 
. rename Q y

. 
. fvset base none id item

. 
. set seed 10
program my1plllogit
args lnf xb
tempvar infj
quietly generate 'infj' = ln(invlogit ('xb') 
if $MH_y = = 1 & $MH_touse
quietly replace 'lnf' = ln(invlogit(-'xb')
if $MH_y = = 0 & $MH_touse
quietly summarize 'infj', meanonly
if r(N) < $MH_n {
scalar 'lnf' = .
exist
        }
scalar 'lnf' = r (sum)    
end

bayesmh y i.item, noconstant reffects(id) llevaluator(my1plllogit)
            prior({y:i.id},normal(0,{var}))
            prior({y:i.item}, {y:1bn.item}, normal(0,10))
            prior({var}, igamma(0.01,0.01))
            block({var})block({y:i.item}, reffects)
            exclude({y:i.id})  dots

The codes did not work, errors encountered include:
.
.
. bayesmh y i.item, noconstant reffects(id) llevaluator(my1plllogit)
note:random effects ibn.id are shared between dependent variables
invalid parameter name ibn.id
r(198);

.
. prior({y:i.id},normal(0,{var}))
command prior is unrecognized
r(199);

.
. prior({y:i.item}, {y:1bn.item}, normal(0,10))
command prior is unrecognized
r(199);

.
. prior({var}, igamma(0.01,0.01))
command prior is unrecognized
r(199);

.
. block({var})block({y:i.item}, reffects)
command block is unrecognized
r(199);

.
. exclude({y:i.id}) dots
command exclude is unrecognized
r(199);

.please I need help . Thanks

Comment

Olayiwola Adetutu

Join Date: Sep 2019
Posts: 59

15 Oct 2019, 01:14

The subset of the data is:

Code:

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

15 Oct 2019, 03:47

#6 and #7 have no bearing on the thread title. Please start a new thread with a good title.
Comment

Announcement