encoding a string variable with a delimiter

Yousef Srouji

Join Date: Jul 2022

Posts: 2
#1

encoding a string variable with a delimiter

14 Jul 2022, 01:58

Hello Community,

I've been struggling with this one. I have a dataset which is a questionnaire, in this dataset there are variables that include more than one answer. For example one of the questions is "what new product would you like to see" and the answers are multiple choice but the interviewee can answer as many different options as they prefer.
This variable is new_products. It is imported as a string variable with delimiter "," for those who have answered with multiple options. I am trying to encode it into a numeric value but when I do that each combination of answers is given a different value, instead of each option given a number then delimited by "," for those with multiple answers.
Instead, I used split to split up the variable which created 22 new variables. I then encoded the new variables using the same label, now I want to be able to tabulate across all of these variables, how can I do that? Is there a better way to encode a string variable with a delimiter or am I on the right track? I want to be able to analyze the data now, i.e what percentage of people chose option 1, etc..

Here is my code for this variable:

split newproducts, gen(new_products_split)
encode new_products_split1, gen(new_prod1) label(prods)
encode new_products_split2, gen(new_prod2) label(prods)
encode new_products_split3, gen(new_prod3) label(prods)
encode new_products_split4, gen(new_prod4) label(prods)
encode new_products_split5, gen(new_prod5) label(prods)
encode new_products_split6, gen(new_prod6) label(prods)
encode new_products_split7, gen(new_prod7) label(prods)
encode new_products_split8, gen(new_prod8) label(prods)
encode new_products_split9, gen(new_prod9) label(prods)
encode new_products_split10, gen(new_prod10) label(prods)
encode new_products_split11, gen(new_prod11) label(prods)
encode new_products_split12, gen(new_prod12) label(prods)
encode new_products_split13, gen(new_prod13) label(prods)
encode new_products_split14, gen(new_prod14) label(prods)
encode new_products_split15, gen(new_prod15) label(prods)
encode new_products_split16, gen(new_prod16) label(prods)
encode new_products_split17, gen(new_prod17) label(prods)
encode new_products_split18, gen(new_prod18) label(prods)
encode new_products_split19, gen(new_prod19) label(prods)
encode new_products_split20, gen(new_prod20) label(prods)
encode new_products_split21, gen(new_prod21) label(prods)
encode new_products_split22, gen(new_prod22) label(prods)

Any tips would be much appreciated!

Best,

Yousef
Tags: delimit, Encoding, string

Nick Cox

Join Date: Mar 2014
Posts: 36054

14 Jul 2022, 02:24

You could write a loop over your encodes. An alternative is to use multencode from SSC, which will do them all at once and produce a tidier result. For looking at the results, consider tabm and tabsplit from the tab_chi package on SSC.

There is no data example here (FAQ Advice #12), so I created a silly one.

Code:

clear 
input str7 new_product 
"A,B,C"
"B,D,E,F"
"G,H,A,B"
end 

split new_product, parse(,)

multencode `r(varlist)', gen(split1-split`r(k_new)')

list 

list, nolabel 

label list 

tabm split*, transpose 

tabsplit new_product, parse(,)

Code:

. clear 

. input str7 new_product 

     new_pro~t
  1. "A,B,C"
  2. "B,D,E,F"
  3. "G,H,A,B"
  4. end 

. 
. split new_product, parse(,)
variables created as string: 
new_product1  new_product2  new_product3  new_product4

. 
. multencode `r(varlist)', gen(split1-split`r(k_new)')

. 
. list 

     +------------------------------------------------------------------------------------------+
     | new_pr~t   new_pr~1   new_pr~2   new_pr~3   new_pr~4   split1   split2   split3   split4 |
     |------------------------------------------------------------------------------------------|
  1. |    A,B,C          A          B          C                   A        B        C        . |
  2. |  B,D,E,F          B          D          E          F        B        D        E        F |
  3. |  G,H,A,B          G          H          A          B        G        H        A        B |
     +------------------------------------------------------------------------------------------+

. 
. list, nolabel 

     +------------------------------------------------------------------------------------------+
     | new_pr~t   new_pr~1   new_pr~2   new_pr~3   new_pr~4   split1   split2   split3   split4 |
     |------------------------------------------------------------------------------------------|
  1. |    A,B,C          A          B          C                   1        2        3        . |
  2. |  B,D,E,F          B          D          E          F        2        4        5        6 |
  3. |  G,H,A,B          G          H          A          B        7        8        1        2 |
     +------------------------------------------------------------------------------------------+

. 
. label list 
new_product1:
           1 A
           2 B
           3 C
           4 D
           5 E
           6 F
           7 G
           8 H

. 
. tabm split*, transpose 

           |                  variable
    values |    split1     split2     split3     split4 |     Total
-----------+--------------------------------------------+----------
         A |         1          0          1          0 |         2 
         B |         1          1          0          1 |         3 
         C |         0          0          1          0 |         1 
         D |         0          1          0          0 |         1 
         E |         0          0          1          0 |         1 
         F |         0          0          0          1 |         1 
         G |         1          0          0          0 |         1 
         H |         0          1          0          0 |         1 
-----------+--------------------------------------------+----------
     Total |         3          3          3          2 |        11 

. 
. tabsplit new_product, parse(,)

new_product |      Freq.     Percent        Cum.
------------+-----------------------------------
          A |          2       18.18       18.18
          B |          3       27.27       45.45
          C |          1        9.09       54.55
          D |          1        9.09       63.64
          E |          1        9.09       72.73
          F |          1        9.09       81.82
          G |          1        9.09       90.91
          H |          1        9.09      100.00
------------+-----------------------------------
      Total |         11      100.00

Comment

Yousef Srouji

Join Date: Jul 2022

Posts: 2
#3

14 Jul 2022, 02:45

This solved it! Thank you very much :D
Comment

Announcement

encoding a string variable with a delimiter

Comment

Comment