Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to correctly split different responses in a singles variable

    Hello,

    I have a dataset with a variable that corresponds to the answer to the question "When do you normally check your social networks?".
    The problem I have is that the variable comes as follows:

    id Var0

    1 Home, School
    2 Home, School
    3 School, Lunch, Home
    4 Gym
    5 Lunch, School, Gym
    6 Library, School, Home, Gym, Lunch

    I am interested in having these variables separated, what I tried is this:

    *first split the variable into different variables (Var1, Var2, Var3, Var4, Var5)

    split Var1 p(,)

    *Asigning a label to each of the possible answers, for each of the new variables (generated by the plit)
    *1

    label define Label_1 1 "Library" 2 "Home" 3 "School" 4 "Gym" 5 "Lunch"
    encode Var1, generate (E_Var1) label(Label_1)
    generate Place1=0
    replace Place1 = 1 if E_Var1==1
    replace Place1 = 2 if E_Var1==2
    replace Place1 = 3 if E_Var1==3
    replace Place1 = 4 if E_Var1==4
    replace Place1 = 5 if E_Var1==5

    *2

    encode Var2, generate (E_Var2) label(Label_1)
    generate Place2=0
    replace Place2 = 1 if E_Var2==1
    replace Place2 = 2 if E_Var2==2
    replace Place2 = 3 if E_Var2==3
    replace Place2 = 4 if E_Var2==4
    replace Place2 = 5 if E_Var2==5

    *And the same for the next 3 Variables.

    *Finally I tried to generate a new variable taking into account just one of the answers, this is a variable for Home, one for Library, one for Lunch, etc.


    gen Library = 0
    replace Library=1 if Place1== 1 | Place2 == 1 | Place3 == 1 | Place4 == 1 | Place5== 1

    gen Home = 0
    replace Home =1 if Place1== 2 | Place2 == 2 | Place3 == 2 | Place4 == 2 | Place5== 2

    gen During_Lectures = 0
    replace During_Lectures = 1 if Place1== 3 | Place2 == 3 | Place3 == 3 | Place4 == 3 | Place5== 3

    *And so on...


    The problem is that this code does not give me the correct result, Stata assigns different labels to "Home" for example, in each of the encoded variables.


    I would really appreciate if you can help me with this.



    Last edited by Jose Ortega; 25 Nov 2017, 10:06.

  • #2
    Welcome to Statalist.

    Here is some code that should start you in the right direction.
    Code:
    clear
    input id str80 response
    1 "Home, School"
    2 "Home, School"
    3 "School, Lunch, Home"
    4 "Gym"
    5 "Lunch, School, Gym"
    6 "Library, School, Home, Gym, Lunch"
    end
    split response, generate(v) parse(, " ") trim
    list, clean
    reshape long v, i(id) j(num)
    drop if missing(v)
    drop num
    list in 1/10, clean
    generate one = 1
    reshape wide one, i(id) j(v) string
    list, clean
    foreach v of varlist one* {
        replace `v' = 0 if `v'==.
        }
    rename (one*) (*)
    list, clean
    And here is the output of the final list command.
    Code:
           id   Gym   Home   Library   Lunch   School                            response  
      1.    1     0      1         0       0        1                        Home, School  
      2.    2     0      1         0       0        1                        Home, School  
      3.    3     0      1         0       1        1                 School, Lunch, Home  
      4.    4     1      0         0       0        0                                 Gym  
      5.    5     1      0         0       1        1                  Lunch, School, Gym  
      6.    6     1      1         1       1        1   Library, School, Home, Gym, Lunch

    Comment


    • #3
      If you know the levels, you could also use grep:

      Code:
      clear
      input id str100 var
      1 "Home, School"
      2 "Home, School"
      3 "School, Lunch, Home"
      4 "Gym"
      5 "Lunch, School, Gym"
      6 "Library, School, Home, Gym, Lunch"
      end
      compress
      generate home = regexs(0) if regexm(var, "Home")
      generate school = regexs(0) if regexm(var, "School")
      generate lunch = regexs(0) if regexm(var, "Lunch")
      generate gym = regexs(0) if regexm(var, "Gym")
      generate library = regexs(0) if regexm(var, "Library")
      list, clean
      
      
      
             id                                 var   home   school   lunch   gym   library  
        1.    1                        Home, School   Home   School                          
        2.    2                        Home, School   Home   School                          
        3.    3                 School, Lunch, Home   Home   School   Lunch                  
        4.    4                                 Gym                           Gym            
        5.    5                  Lunch, School, Gym          School   Lunch   Gym            
        6.    6   Library, School, Home, Gym, Lunch   Home   School   Lunch   Gym   Library
      https://stats.idre.ucla.edu/stata/fa...r-expressions/
      Last edited by Dave Airey; 25 Nov 2017, 11:18. Reason: added url for regex Stata help

      Comment


      • #4
        strpos() will also work just as well here.

        Comment


        • #5
          Thank you very much for the advice, your codes were very helpful.

          Comment


          • #6
            I have a similar yet different problem. I want to split the multiple response this way. However, for one response we did not get any observation as in for options 1 to 5, no one chose option 3. So while splitting such way, how to incorporate that option to be counted as a variable?

            Comment


            • #7
              Consider the code in #3. You can always add new variables, say


              Code:
              generate Nobel = regexs(0) if regexm(var, "Nobel Prize")
              which will be always empty if (and only if) none of your respondents mentioned a Nobel Prize.

              Comment


              • #8
                Thanks William for your helpful advice. From your initial idea, I wrote a program, namely mutilresponse, to handle the problem with any parsing characters. Thank you once again!


                Originally posted by William Lisowski View Post
                Welcome to Statalist.

                Here is some code that should start you in the right direction.
                Code:
                clear
                input id str80 response
                1 "Home, School"
                2 "Home, School"
                3 "School, Lunch, Home"
                4 "Gym"
                5 "Lunch, School, Gym"
                6 "Library, School, Home, Gym, Lunch"
                end
                split response, generate(v) parse(, " ") trim
                list, clean
                reshape long v, i(id) j(num)
                drop if missing(v)
                drop num
                list in 1/10, clean
                generate one = 1
                reshape wide one, i(id) j(v) string
                list, clean
                foreach v of varlist one* {
                replace `v' = 0 if `v'==.
                }
                rename (one*) (*)
                list, clean
                And here is the output of the final list command.
                Code:
                id Gym Home Library Lunch School response
                1. 1 0 1 0 0 1 Home, School
                2. 2 0 1 0 0 1 Home, School
                3. 3 0 1 0 1 1 School, Lunch, Home
                4. 4 1 0 0 0 0 Gym
                5. 5 1 0 0 1 1 Lunch, School, Gym
                6. 6 1 1 1 1 1 Library, School, Home, Gym, Lunch

                Comment


                • #9
                  I have a question regarding this. Imagine having a string variable containing the numbers and extended missing values: 1 2 3 4 5 6 7 8 9 10 11 12 4444 .n .a

                  I want to splied this string variable as shown in post 3 but all while matching the word exactly. However, when i run the code, the result do not appear as required. For example if a string row has the text "8 10 11" then it will make the variable having suffix _1 to also contain value where as it shouldnt. Example code below:

                  Code:
                  clear all
                  
                  input str14 benefits_ad
                  ".n"
                  ".n"
                  "1 2 3 4 6 8 10"
                  "4444"
                  "1 3 6 8 10"
                  "1 3"
                  "1 3"
                  "1 2 3 4 8 10"
                  ".n"
                  "8 10 11"
                  end
                  
                  compress
                  
                  local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 n a"
                  
                  foreach i in `list' {
                      generate benefit_`i' = regexs(0) if regexm(benefits_ad, "`i'")
                  }
                  Result:

                  Click image for larger version

Name:	Screenshot 2022-06-05 224219.png
Views:	1
Size:	8.9 KB
ID:	1667909


                  As seen in row 4 and 10, the result is appearing incorrect. How can I match the word properly in this instance.

                  Comment


                  • #10
                    Also tried this variation but not sure why it keeps matching in this manner instead of comparing by word

                    Code:
                    local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 n a"
                    local n : word count `list'
                    
                    forvalues i = 1/`n' {
                        
                        local a : word `i' of `list'
                        generate benefit_`a' = "`a'" if word(benefits_ad, `i')  == "`a'"
                        
                    }

                    Comment


                    • #11
                      Came up with a not so elegant solution but here it is:


                      Code:
                        
                      clear all
                      
                      input str14 benefits_ad
                      ".n"
                      ".n"
                      "1 2 3 4 6 8 10"
                      "4444"
                      "1 3 6 8 10"
                      "1 3"
                      "1 3"
                      "1 2 3 4 8 10"
                      ".n"
                      "8 10 11"
                      end
                      
                      compress
                      
                      local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 .n .a"
                      
                      foreach i in `list' {
                          local j : subinstr local i "." ""
                          generate benefit_`j' = "`i'"
                      }
                      
                      local N = _N
                      rename benefit_# test#, r dryrun
                      ds `r(oldnames)' benefit_n benefit_a
                      foreach var in `r(varlist)' {
                      forvalues i = 1/`N' {
                          local s1 = benefits_ad[`i']
                          local s2 = `var'[`i']        
                          local intersection: list s1 & s2
                          local n : word count `intersection'
                          display `n'
                          replace `var'= "" in `i' if `n' == 0
                      }
                      }
                      If anyone can help me improve this then that will be great!

                      Comment


                      • #12
                        I think you're confusing a loop over the words you are searching for -- which is needed -- with a loop over the words of each string value -- which isn't.

                        If something occurs as a word "foo" then you'll find " foo " -- except possibly at the beginning and end of a string value. So, the code for something that catches those cases too.

                        Code:
                        clear all
                        
                        input str14 benefits_ad
                        ".n"
                        ".n"
                        "1 2 3 4 6 8 10"
                        "4444"
                        "1 3 6 8 10"
                        "1 3"
                        "1 3"
                        "1 2 3 4 8 10"
                        ".n"
                        "8 10 11"
                        end
                        
                        compress
                        
                        local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 .n .a"
                        
                        foreach x of local list { 
                            local new = strtoname("is`x'")
                            gen `new' = "`x'" if strpos(" " + benefits_ad + " ", " `x' ") > 0 
                        }
                        
                        list benefits_ad is1-is7 
                        
                        list is8-isa 
                        
                        
                        
                        
                        
                        
                        . list benefits_ad is1-is7 
                        
                             +----------------------------------------------------------+
                             |    benefits_ad   is1   is2   is3   is4   is5   is6   is7 |
                             |----------------------------------------------------------|
                          1. |             .n                                           |
                          2. |             .n                                           |
                          3. | 1 2 3 4 6 8 10     1     2     3     4           6       |
                          4. |           4444                                           |
                          5. |     1 3 6 8 10     1           3                 6       |
                             |----------------------------------------------------------|
                          6. |            1 3     1           3                         |
                          7. |            1 3     1           3                         |
                          8. |   1 2 3 4 8 10     1     2     3     4                   |
                          9. |             .n                                           |
                         10. |        8 10 11                                           |
                             +----------------------------------------------------------+
                        
                        . 
                        . list is8-isa 
                        
                             +------------------------------------------------------+
                             | is8   is9   is10   is11   is12   is4444   is_n   isa |
                             |------------------------------------------------------|
                          1. |                                             .n       |
                          2. |                                             .n       |
                          3. |   8           10                                     |
                          4. |                                    4444              |
                          5. |   8           10                                     |
                             |------------------------------------------------------|
                          6. |                                                      |
                          7. |                                                      |
                          8. |   8           10                                     |
                          9. |                                             .n       |
                         10. |   8           10     11                              |
                             +------------------------------------------------------+

                        Comment


                        • #13
                          Nick Cox thank you so much! As pointed out, i was not looking at it properly. Was not aware I can match in this manner with strpos (mainly due to lack of understanding) but thank you for this wonderful example!

                          Comment


                          • #14
                            That's fine. Note that my code looks for a as a word but you might need to change that to .a

                            Comment

                            Working...
                            X