convert concatenate strings into numeric

Chen Samulsion started a topic convert concatenate strings into numeric

14 Oct 2019, 06:21
convert concatenate strings into numeric
Dear Stata users,

I have a data like below, the researchers input variables as alphabet. Now I want to convert those strings into numeric such that "A" as "1", "B" as "2", "C" as "3". It is easy to do when string has only one alphabet, but in cases that strings was concatenated as "A,B,C", how can I address it? Thank you in advance for advice.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str20 x1 str22 x2 str18 x3 "A" "A" "C" "A,B" "A,C" "B,D" "B,C" "C,G" "A" "A,B,C" "C" "B,C,D,E" "B" "B" "B" "C" "C" "A" "A,B" "A,C" "B,D" "B" "E" "E" "A" "B,C,D,F" "A" "A,B" "B" "A,B,C,E" "B" "A,F,G" "B,C,E" end
Tags: string
Nick Cox replied

15 Oct 2019, 03:47
#6 and #7 have no bearing on the thread title. Please start a new thread with a good title.
Leave a comment:

Olayiwola Adetutu replied

15 Oct 2019, 01:14

The subset of the data is:

Code:

Leave a comment:

Olayiwola Adetutu replied

15 Oct 2019, 01:12

I am using Stata 16 SE, Please, I want to fit one and two parameter logistic models using bayesmh, to the sample of the data below :

HTML Code:

  	 		 			Q1 			Q2 			Q3 			Q4 			Q5 			Q6 			Q7 			Q8 			Q9 			Q10 		 		 			0 			0 			1 			1 			1 			1 			1 			1 			1 			1 		 		 			1 			1 			1 			1 			1 			1 			1 			0 			1 			1 		 		 			0 			0 			1 			1 			1 			1 			0 			1 			1 			1 		 		 			1 			1 			1 			0 			1 			0 			0 			0 			1 			1 		 		 			1 			0 			1 			1 			1 			1 			0 			0 			0 			0 		 		 			0 			1 			1 			1 			1 			1 			1 			0 			0 			0 		 		 			1 			1 			1 			1 			1 			0 			1 			0 			0 			1 		 		 			1 			0 			1 			0 			1 			1 			0 			0 			0 			1 		 		 			1 			1 			1 			0 			1 			0 			1 			0 			1 			0 		 		 			0 			1 			1 			1 			1 			1 			0 			1 			0 			1 		 		 			1 			0 			1 			1 			1 			0 			0 			1 			0 			0 		 		 			0 			0 			1 			0 			1 			1 			1 			0 			1 			0 		 		 			1 			0 			1 			0 			1 			1 			0 			0 			0 			0 		 		 			0 			0 			1 			1 			1 			1 			1 			1 			1 			1 		 		 			0 			1 			1 			1 			1 			0 			0 			0 			1 			1 		 		 			0 			1 			1 			1 			1 			1 			0 			1 			1 			1 		 		 			0 			1 			1 			1 			1 			1 			0 			0 			1 			1 		 		 			1 			0 			1 			1 			1 			0 			0 			0 			0 			0 		 		 			0 			0 			1 			1 			1 			1 			0 			0 			1 			0

Th original data made up of 35 questions answered by 403 examinees using this codes:

Code:

 
.set maxvar 30000

. set emptycells drop

. import excel "C:\Users\MATTHEW ADETUTU\Documents\Result_Coding.xlsx", sheet("Sheet 1") firstrow
(35 vars, 403 obs)

. generate id = _n

. 
. quietly reshape long Q, i(id) j(item)

. 
. rename Q y

. 
. fvset base none id item

. 
. set seed 10
program my1plllogit
args lnf xb
tempvar infj
quietly generate 'infj' = ln(invlogit ('xb') 
if $MH_y = = 1 & $MH_touse
quietly replace 'lnf' = ln(invlogit(-'xb')
if $MH_y = = 0 & $MH_touse
quietly summarize 'infj', meanonly
if r(N) < $MH_n {
scalar 'lnf' = .
exist
        }
scalar 'lnf' = r (sum)    
end

bayesmh y i.item, noconstant reffects(id) llevaluator(my1plllogit)
            prior({y:i.id},normal(0,{var}))
            prior({y:i.item}, {y:1bn.item}, normal(0,10))
            prior({var}, igamma(0.01,0.01))
            block({var})block({y:i.item}, reffects)
            exclude({y:i.id})  dots

The codes did not work, errors encountered include:
.
.
. bayesmh y i.item, noconstant reffects(id) llevaluator(my1plllogit)
note:random effects ibn.id are shared between dependent variables
invalid parameter name ibn.id
r(198);

.
. prior({y:i.id},normal(0,{var}))
command prior is unrecognized
r(199);

.
. prior({y:i.item}, {y:1bn.item}, normal(0,10))
command prior is unrecognized
r(199);

.
. prior({var}, igamma(0.01,0.01))
command prior is unrecognized
r(199);

.
. block({var})block({y:i.item}, reffects)
command block is unrecognized
r(199);

.
. exclude({y:i.id}) dots
command exclude is unrecognized
r(199);

.please I need help . Thanks

Leave a comment:

Chen Samulsion replied

15 Oct 2019, 00:38
Dear Jorrit Gosens and Nick Cox, thank you very much. Nick, I'm sorry for not clarifying my query, I just want to replace string to numeric variable by variable. Your answer in #3 is just enough to meet my problem! However I'm glad to see the further step using concat function that you provided, I always learn much from you.

Code:

tokenize `c(ALPHA)' forval x = 1/26 { foreach v in x1 x2 x3 { replace `v' = subinstr(`v', "``x''", "`x'", .) } }
Leave a comment:

Nick Cox replied

14 Oct 2019, 08:58

Here is a cleaned-up concatenate any way (no duplicates, tidy order):

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str20 x1 str22 x2 str18 x3
"A"     "A"       "C"      
"A,B"   "A,C"     "B,D"    
"B,C"   "C,G"     "A"      
"A,B,C" "C"       "B,C,D,E"
"B"     "B"       "B"      
"C"     "C"       "A"      
"A,B"   "A,C"     "B,D"    
"B"     "E"       "E"      
"A"     "B,C,D,F" "A"      
"A,B"   "B"       "A,B,C,E"
"B"     "A,F,G"   "B,C,E"  
end

tokenize `c(ALPHA)' 

gen wanted = "" 

quietly forval x = 1/26 { 
    replace wanted = cond(wanted == "", "`x'", wanted + ",`x'") if strpos(x1, "``x''") | strpos(x2, "``x''") | strpos(x3, "``x''") 
}

list , sep(0) 
    
     +-----------------------------------------+
     |    x1        x2        x3        wanted |
     |-----------------------------------------|
  1. |     A         A         C           1,3 |
  2. |   A,B       A,C       B,D       1,2,3,4 |
  3. |   B,C       C,G         A       1,2,3,7 |
  4. | A,B,C         C   B,C,D,E     1,2,3,4,5 |
  5. |     B         B         B             2 |
  6. |     C         C         A           1,3 |
  7. |   A,B       A,C       B,D       1,2,3,4 |
  8. |     B         E         E           2,5 |
  9. |     A   B,C,D,F         A     1,2,3,4,6 |
 10. |   A,B         B   A,B,C,E       1,2,3,5 |
 11. |     B     A,F,G     B,C,E   1,2,3,5,6,7 |
     +-----------------------------------------+

Leave a comment:

Nick Cox replied

14 Oct 2019, 07:44

This may help. I wonder what you want to do about duplicates, but you say nothing about that, so no suggestions here.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str20 x1 str22 x2 str18 x3
"A"     "A"       "C"      
"A,B"   "A,C"     "B,D"    
"B,C"   "C,G"     "A"      
"A,B,C" "C"       "B,C,D,E"
"B"     "B"       "B"      
"C"     "C"       "A"      
"A,B"   "A,C"     "B,D"    
"B"     "E"       "E"      
"A"     "B,C,D,F" "A"      
"A,B"   "B"       "A,B,C,E"
"B"     "A,F,G"   "B,C,E"  
end

tokenize `c(ALPHA)' 
forval x = 1/26 { 
    foreach v in x1 x2 x3 { 
        replace `v' = subinstr(`v', "``x''", "`x'", .) 
    }
} 

egen X = concat(x?) , p(,) 

list 

     +---------------------------------------------+
     |    x1        x2        x3                 X |
     |---------------------------------------------|
  1. |     1         1         3             1,1,3 |
  2. |   1,2       1,3       2,4       1,2,1,3,2,4 |
  3. |   2,3       3,7         1         2,3,3,7,1 |
  4. | 1,2,3         3   2,3,4,5   1,2,3,3,2,3,4,5 |
  5. |     2         2         2             2,2,2 |
  6. |     3         3         1             3,3,1 |
  7. |   1,2       1,3       2,4       1,2,1,3,2,4 |
  8. |     2         5         5             2,5,5 |
  9. |     1   2,3,4,6         1       1,2,3,4,6,1 |
 10. |   1,2         2   1,2,3,5     1,2,2,1,2,3,5 |
 11. |     2     1,6,7     2,3,5     2,1,6,7,2,3,5 |
     +---------------------------------------------+

Leave a comment:

Jorrit Gosens replied

14 Oct 2019, 07:13
What would the desired end result look like?
e.g., "A,B" should be turned into "1,2"? Or 2 separate numeric variables?
Cam you give explicit examples of what you'd want to have in the end for a few observations?
1 like
Leave a comment:

Announcement