Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate dummy variable in stata having string multiple response

    Dear All,

    I want to generate dummy variables from a string variable cards_hh which has response like ABCD (without any space). I have tried split command but gives me error.


    split cards_hh , generate(resp) destring
    cannot generate new variables using stub resp
    r(110);


    . split cards_hh , generate(resp) destring
    variable born as string:
    resp1
    resp1: contains nonnumeric characters; no replace




    Any help is highly appreciated.


    Thanks

    Ashish

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 cards_hh
    "B"   
    "A"   
    "AC"  
    "C"   
    "AC"  
    "AC"  
    "A"   
    "A"   
    "AB"  
    "AB"  
    "AC"  
    "AC"  
    "AC"  
    "ABCD"
    "ABC" 
    "AC"  
    "AC"  
    "A"   
    "AC"  
    "A"   
    "A"   
    "A"   
    "ACD" 
    "ABC" 
    "AC"  
    "ABC" 
    "D"   
    "ABC" 
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "ABC" 
    "AB"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "ACD" 
    "ABD" 
    "ACD" 
    "ACD" 
    "ABCD"
    "ABD" 
    "ABD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ACD" 
    "ABC" 
    "ACD" 
    "AD"  
    "ACD" 
    "ACD" 
    "D"   
    "ACD" 
    "ABC" 
    "AC"  
    "AC"  
    "A"   
    "AC"  
    "ABD" 
    "ACD" 
    "AB"  
    "ACD" 
    "AC"  
    "AC"  
    "ABC" 
    "AC"  
    "ACD" 
    "ABC" 
    "AC"  
    "C"   
    "C"   
    "AC"  
    "C"   
    "ABC" 
    "C"   
    "AC"  
    "AC"  
    "AC"  
    "AC"  
    "ABC" 
    "AC"  
    "A"   
    "AC"  
    end
    Last edited by Ashish Bandhu; 17 Oct 2022, 00:16.

  • #2
    Ashish:
    why not considering:
    Code:
    . egen wanted=group(cards_hh)
    . label define wanted 1 "A" 2 "AB" 3 "ABC" 4 "ABCD" 5 "ABD" 6 "AC" 7 "ACD" 8 "AD" 9 "B" 10 "C" 11 "D"
    . label val wanted wanted
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      split requites a separator, such as a space. You don't have any separator.

      As its original author I can testify that I considered extending its application to cases like this, but pulled back on two grounds:

      1. That would complicate the syntax and the help file.

      2. There is generally a simple alternative.

      Point 2 above is true in this case.

      In any case split doesn't generate indicator variables (you say "dummy") at all.

      Code:
      foreach value in A B C D { 
             gen `value' = strpos(cards_hh, "`value'") > 0 
      }
      will produce four indicator variables that are 1 if the named character occurs and 0 otherwise.

      See https://www.stata-journal.com/articl...article=dm0099 Section 2 for a recommendation against the ugly term "dummy variable".

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Ashish:
        why not considering:
        Code:
        . egen wanted=group(cards_hh)
        . label define wanted 1 "A" 2 "AB" 3 "ABC" 4 "ABCD" 5 "ABD" 6 "AC" 7 "ACD" 8 "AD" 9 "B" 10 "C" 11 "D"
        . label val wanted wanted
        Thank you for reply. I did not want to encode the original variable rather wanted generate 4 dummy variables based on response A, B, C, D. Cox has solved the problem.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          split requites a separator, such as a space. You don't have any separator.

          As its original author I can testify that I considered extending its application to cases like this, but pulled back on two grounds:

          1. That would complicate the syntax and the help file.

          2. There is generally a simple alternative.

          Point 2 above is true in this case.

          In any case split doesn't generate indicator variables (you say "dummy") at all.

          Code:
          foreach value in A B C D {
          gen `value' = strpos(cards_hh, "`value'") > 0
          }
          will produce four indicator variables that are 1 if the named character occurs and 0 otherwise.

          See https://www.stata-journal.com/articl...article=dm0099 Section 2 for a recommendation against the ugly term "dummy variable".
          Thank you Nick. Got it

          Comment

          Working...
          X