Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • convert concatenate strings into numeric

    Dear Stata users,

    I have a data like below, the researchers input variables as alphabet. Now I want to convert those strings into numeric such that "A" as "1", "B" as "2", "C" as "3". It is easy to do when string has only one alphabet, but in cases that strings was concatenated as "A,B,C", how can I address it? Thank you in advance for advice.
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str20 x1 str22 x2 str18 x3
    "A"     "A"       "C"      
    "A,B"   "A,C"     "B,D"    
    "B,C"   "C,G"     "A"      
    "A,B,C" "C"       "B,C,D,E"
    "B"     "B"       "B"      
    "C"     "C"       "A"      
    "A,B"   "A,C"     "B,D"    
    "B"     "E"       "E"      
    "A"     "B,C,D,F" "A"      
    "A,B"   "B"       "A,B,C,E"
    "B"     "A,F,G"   "B,C,E"  
    end

  • #2
    What would the desired end result look like?
    e.g., "A,B" should be turned into "1,2"? Or 2 separate numeric variables?
    Cam you give explicit examples of what you'd want to have in the end for a few observations?

    Comment


    • #3
      This may help. I wonder what you want to do about duplicates, but you say nothing about that, so no suggestions here.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str20 x1 str22 x2 str18 x3
      "A"     "A"       "C"      
      "A,B"   "A,C"     "B,D"    
      "B,C"   "C,G"     "A"      
      "A,B,C" "C"       "B,C,D,E"
      "B"     "B"       "B"      
      "C"     "C"       "A"      
      "A,B"   "A,C"     "B,D"    
      "B"     "E"       "E"      
      "A"     "B,C,D,F" "A"      
      "A,B"   "B"       "A,B,C,E"
      "B"     "A,F,G"   "B,C,E"  
      end
      
      tokenize `c(ALPHA)' 
      forval x = 1/26 { 
          foreach v in x1 x2 x3 { 
              replace `v' = subinstr(`v', "``x''", "`x'", .) 
          }
      } 
      
      egen X = concat(x?) , p(,) 
      
      list 
      
           +---------------------------------------------+
           |    x1        x2        x3                 X |
           |---------------------------------------------|
        1. |     1         1         3             1,1,3 |
        2. |   1,2       1,3       2,4       1,2,1,3,2,4 |
        3. |   2,3       3,7         1         2,3,3,7,1 |
        4. | 1,2,3         3   2,3,4,5   1,2,3,3,2,3,4,5 |
        5. |     2         2         2             2,2,2 |
        6. |     3         3         1             3,3,1 |
        7. |   1,2       1,3       2,4       1,2,1,3,2,4 |
        8. |     2         5         5             2,5,5 |
        9. |     1   2,3,4,6         1       1,2,3,4,6,1 |
       10. |   1,2         2   1,2,3,5     1,2,2,1,2,3,5 |
       11. |     2     1,6,7     2,3,5     2,1,6,7,2,3,5 |
           +---------------------------------------------+

      Comment


      • #4
        Here is a cleaned-up concatenate any way (no duplicates, tidy order):

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str20 x1 str22 x2 str18 x3
        "A"     "A"       "C"      
        "A,B"   "A,C"     "B,D"    
        "B,C"   "C,G"     "A"      
        "A,B,C" "C"       "B,C,D,E"
        "B"     "B"       "B"      
        "C"     "C"       "A"      
        "A,B"   "A,C"     "B,D"    
        "B"     "E"       "E"      
        "A"     "B,C,D,F" "A"      
        "A,B"   "B"       "A,B,C,E"
        "B"     "A,F,G"   "B,C,E"  
        end
        
        tokenize `c(ALPHA)' 
        
        gen wanted = "" 
        
        quietly forval x = 1/26 { 
            replace wanted = cond(wanted == "", "`x'", wanted + ",`x'") if strpos(x1, "``x''") | strpos(x2, "``x''") | strpos(x3, "``x''") 
        }
        
        list , sep(0) 
            
             +-----------------------------------------+
             |    x1        x2        x3        wanted |
             |-----------------------------------------|
          1. |     A         A         C           1,3 |
          2. |   A,B       A,C       B,D       1,2,3,4 |
          3. |   B,C       C,G         A       1,2,3,7 |
          4. | A,B,C         C   B,C,D,E     1,2,3,4,5 |
          5. |     B         B         B             2 |
          6. |     C         C         A           1,3 |
          7. |   A,B       A,C       B,D       1,2,3,4 |
          8. |     B         E         E           2,5 |
          9. |     A   B,C,D,F         A     1,2,3,4,6 |
         10. |   A,B         B   A,B,C,E       1,2,3,5 |
         11. |     B     A,F,G     B,C,E   1,2,3,5,6,7 |
             +-----------------------------------------+

        Comment


        • #5
          Dear Jorrit Gosens and Nick Cox, thank you very much. Nick, I'm sorry for not clarifying my query, I just want to replace string to numeric variable by variable. Your answer in #3 is just enough to meet my problem! However I'm glad to see the further step using concat function that you provided, I always learn much from you.
          Code:
          tokenize `c(ALPHA)' 
          forval x = 1/26 { 
              foreach v in x1 x2 x3 { 
                  replace `v' = subinstr(`v', "``x''", "`x'", .) 
              }
          }

          Comment


          • #6
            I am using Stata 16 SE, Please, I want to fit one and two parameter logistic models using bayesmh, to the sample of the data below :
            HTML Code:
              	 		 			Q1 			Q2 			Q3 			Q4 			Q5 			Q6 			Q7 			Q8 			Q9 			Q10 		 		 			0 			0 			1 			1 			1 			1 			1 			1 			1 			1 		 		 			1 			1 			1 			1 			1 			1 			1 			0 			1 			1 		 		 			0 			0 			1 			1 			1 			1 			0 			1 			1 			1 		 		 			1 			1 			1 			0 			1 			0 			0 			0 			1 			1 		 		 			1 			0 			1 			1 			1 			1 			0 			0 			0 			0 		 		 			0 			1 			1 			1 			1 			1 			1 			0 			0 			0 		 		 			1 			1 			1 			1 			1 			0 			1 			0 			0 			1 		 		 			1 			0 			1 			0 			1 			1 			0 			0 			0 			1 		 		 			1 			1 			1 			0 			1 			0 			1 			0 			1 			0 		 		 			0 			1 			1 			1 			1 			1 			0 			1 			0 			1 		 		 			1 			0 			1 			1 			1 			0 			0 			1 			0 			0 		 		 			0 			0 			1 			0 			1 			1 			1 			0 			1 			0 		 		 			1 			0 			1 			0 			1 			1 			0 			0 			0 			0 		 		 			0 			0 			1 			1 			1 			1 			1 			1 			1 			1 		 		 			0 			1 			1 			1 			1 			0 			0 			0 			1 			1 		 		 			0 			1 			1 			1 			1 			1 			0 			1 			1 			1 		 		 			0 			1 			1 			1 			1 			1 			0 			0 			1 			1 		 		 			1 			0 			1 			1 			1 			0 			0 			0 			0 			0 		 		 			0 			0 			1 			1 			1 			1 			0 			0 			1 			0
            Th original data made up of 35 questions answered by 403 examinees using this codes:
            Code:
             
            .set maxvar 30000
            
            . set emptycells drop
            
            . import excel "C:\Users\MATTHEW ADETUTU\Documents\Result_Coding.xlsx", sheet("Sheet 1") firstrow
            (35 vars, 403 obs)
            
            . generate id = _n
            
            . 
            . quietly reshape long Q, i(id) j(item)
            
            . 
            . rename Q y
            
            . 
            . fvset base none id item
            
            . 
            . set seed 10
            program my1plllogit
            args lnf xb
            tempvar infj
            quietly generate 'infj' = ln(invlogit ('xb') 
            if $MH_y = = 1 & $MH_touse
            quietly replace 'lnf' = ln(invlogit(-'xb')
            if $MH_y = = 0 & $MH_touse
            quietly summarize 'infj', meanonly
            if r(N) < $MH_n {
            scalar 'lnf' = .
            exist
                    }
            scalar 'lnf' = r (sum)    
            end
            
            bayesmh y i.item, noconstant reffects(id) llevaluator(my1plllogit)
                        prior({y:i.id},normal(0,{var}))
                        prior({y:i.item}, {y:1bn.item}, normal(0,10))
                        prior({var}, igamma(0.01,0.01))
                        block({var})block({y:i.item}, reffects)
                        exclude({y:i.id})  dots
            The codes did not work, errors encountered include:
            .
            .
            . bayesmh y i.item, noconstant reffects(id) llevaluator(my1plllogit)
            note:random effects ibn.id are shared between dependent variables
            invalid parameter name ibn.id
            r(198);

            .
            . prior({y:i.id},normal(0,{var}))
            command prior is unrecognized
            r(199);

            .
            . prior({y:i.item}, {y:1bn.item}, normal(0,10))
            command prior is unrecognized
            r(199);

            .
            . prior({var}, igamma(0.01,0.01))
            command prior is unrecognized
            r(199);

            .
            . block({var})block({y:i.item}, reffects)
            command block is unrecognized
            r(199);

            .
            . exclude({y:i.id}) dots
            command exclude is unrecognized
            r(199);

            .please I need help . Thanks

            Comment


            • #7
              The subset of the data is:
              Code:
               
              Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
              0 0 1 1 1 1 1 1 1 1
              1 1 1 1 1 1 1 0 1 1
              0 0 1 1 1 1 0 1 1 1
              1 1 1 0 1 0 0 0 1 1
              1 0 1 1 1 1 0 0 0 0
              0 1 1 1 1 1 1 0 0 0
              1 1 1 1 1 0 1 0 0 1
              1 0 1 0 1 1 0 0 0 1
              1 1 1 0 1 0 1 0 1 0
              0 1 1 1 1 1 0 1 0 1
              1 0 1 1 1 0 0 1 0 0
              0 0 1 0 1 1 1 0 1 0
              1 0 1 0 1 1 0 0 0 0
              0 0 1 1 1 1 1 1 1 1
              0 1 1 1 1 0 0 0 1 1
              0 1 1 1 1 1 0 1 1 1
              0 1 1 1 1 1 0 0 1 1
              1 0 1 1 1 0 0 0 0 0
              0 0 1 1 1 1 0 0 1 0

              Comment


              • #8
                #6 and #7 have no bearing on the thread title. Please start a new thread with a good title.

                Comment

                Working...
                X