Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • encoding a string variable with a delimiter

    Hello Community,

    I've been struggling with this one. I have a dataset which is a questionnaire, in this dataset there are variables that include more than one answer. For example one of the questions is "what new product would you like to see" and the answers are multiple choice but the interviewee can answer as many different options as they prefer.
    This variable is new_products. It is imported as a string variable with delimiter "," for those who have answered with multiple options. I am trying to encode it into a numeric value but when I do that each combination of answers is given a different value, instead of each option given a number then delimited by "," for those with multiple answers.
    Instead, I used split to split up the variable which created 22 new variables. I then encoded the new variables using the same label, now I want to be able to tabulate across all of these variables, how can I do that? Is there a better way to encode a string variable with a delimiter or am I on the right track? I want to be able to analyze the data now, i.e what percentage of people chose option 1, etc..

    Here is my code for this variable:

    split newproducts, gen(new_products_split)
    encode new_products_split1, gen(new_prod1) label(prods)
    encode new_products_split2, gen(new_prod2) label(prods)
    encode new_products_split3, gen(new_prod3) label(prods)
    encode new_products_split4, gen(new_prod4) label(prods)
    encode new_products_split5, gen(new_prod5) label(prods)
    encode new_products_split6, gen(new_prod6) label(prods)
    encode new_products_split7, gen(new_prod7) label(prods)
    encode new_products_split8, gen(new_prod8) label(prods)
    encode new_products_split9, gen(new_prod9) label(prods)
    encode new_products_split10, gen(new_prod10) label(prods)
    encode new_products_split11, gen(new_prod11) label(prods)
    encode new_products_split12, gen(new_prod12) label(prods)
    encode new_products_split13, gen(new_prod13) label(prods)
    encode new_products_split14, gen(new_prod14) label(prods)
    encode new_products_split15, gen(new_prod15) label(prods)
    encode new_products_split16, gen(new_prod16) label(prods)
    encode new_products_split17, gen(new_prod17) label(prods)
    encode new_products_split18, gen(new_prod18) label(prods)
    encode new_products_split19, gen(new_prod19) label(prods)
    encode new_products_split20, gen(new_prod20) label(prods)
    encode new_products_split21, gen(new_prod21) label(prods)
    encode new_products_split22, gen(new_prod22) label(prods)

    Any tips would be much appreciated!

    Best,

    Yousef

  • #2
    You could write a loop over your encodes. An alternative is to use multencode from SSC, which will do them all at once and produce a tidier result. For looking at the results, consider tabm and tabsplit from the tab_chi package on SSC.

    There is no data example here (FAQ Advice #12), so I created a silly one.

    Code:
    clear 
    input str7 new_product 
    "A,B,C"
    "B,D,E,F"
    "G,H,A,B"
    end 
    
    split new_product, parse(,)
    
    multencode `r(varlist)', gen(split1-split`r(k_new)')
    
    list 
    
    list, nolabel 
    
    label list 
    
    tabm split*, transpose 
    
    tabsplit new_product, parse(,)
    Code:
    . clear 
    
    . input str7 new_product 
    
         new_pro~t
      1. "A,B,C"
      2. "B,D,E,F"
      3. "G,H,A,B"
      4. end 
    
    . 
    . split new_product, parse(,)
    variables created as string: 
    new_product1  new_product2  new_product3  new_product4
    
    . 
    . multencode `r(varlist)', gen(split1-split`r(k_new)')
    
    . 
    . list 
    
         +------------------------------------------------------------------------------------------+
         | new_pr~t   new_pr~1   new_pr~2   new_pr~3   new_pr~4   split1   split2   split3   split4 |
         |------------------------------------------------------------------------------------------|
      1. |    A,B,C          A          B          C                   A        B        C        . |
      2. |  B,D,E,F          B          D          E          F        B        D        E        F |
      3. |  G,H,A,B          G          H          A          B        G        H        A        B |
         +------------------------------------------------------------------------------------------+
    
    . 
    . list, nolabel 
    
         +------------------------------------------------------------------------------------------+
         | new_pr~t   new_pr~1   new_pr~2   new_pr~3   new_pr~4   split1   split2   split3   split4 |
         |------------------------------------------------------------------------------------------|
      1. |    A,B,C          A          B          C                   1        2        3        . |
      2. |  B,D,E,F          B          D          E          F        2        4        5        6 |
      3. |  G,H,A,B          G          H          A          B        7        8        1        2 |
         +------------------------------------------------------------------------------------------+
    
    . 
    . label list 
    new_product1:
               1 A
               2 B
               3 C
               4 D
               5 E
               6 F
               7 G
               8 H
    
    . 
    . tabm split*, transpose 
    
               |                  variable
        values |    split1     split2     split3     split4 |     Total
    -----------+--------------------------------------------+----------
             A |         1          0          1          0 |         2 
             B |         1          1          0          1 |         3 
             C |         0          0          1          0 |         1 
             D |         0          1          0          0 |         1 
             E |         0          0          1          0 |         1 
             F |         0          0          0          1 |         1 
             G |         1          0          0          0 |         1 
             H |         0          1          0          0 |         1 
    -----------+--------------------------------------------+----------
         Total |         3          3          3          2 |        11 
    
    . 
    . tabsplit new_product, parse(,)
    
    new_product |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              A |          2       18.18       18.18
              B |          3       27.27       45.45
              C |          1        9.09       54.55
              D |          1        9.09       63.64
              E |          1        9.09       72.73
              F |          1        9.09       81.82
              G |          1        9.09       90.91
              H |          1        9.09      100.00
    ------------+-----------------------------------
          Total |         11      100.00

    Comment


    • #3
      This solved it! Thank you very much :D

      Comment

      Working...
      X