how to correctly split different responses in a singles variable

Jose Ortega

Join Date: Nov 2017

Posts: 3
#1

how to correctly split different responses in a singles variable

25 Nov 2017, 09:58

Hello,

I have a dataset with a variable that corresponds to the answer to the question "When do you normally check your social networks?".
The problem I have is that the variable comes as follows:

id Var0

1 Home, School
2 Home, School
3 School, Lunch, Home
4 Gym
5 Lunch, School, Gym
6 Library, School, Home, Gym, Lunch

I am interested in having these variables separated, what I tried is this:

*first split the variable into different variables (Var1, Var2, Var3, Var4, Var5)

split Var1 p(,)

*Asigning a label to each of the possible answers, for each of the new variables (generated by the plit)
*1

label define Label_1 1 "Library" 2 "Home" 3 "School" 4 "Gym" 5 "Lunch"
encode Var1, generate (E_Var1) label(Label_1)
generate Place1=0
replace Place1 = 1 if E_Var1==1
replace Place1 = 2 if E_Var1==2
replace Place1 = 3 if E_Var1==3
replace Place1 = 4 if E_Var1==4
replace Place1 = 5 if E_Var1==5

*2

encode Var2, generate (E_Var2) label(Label_1)
generate Place2=0
replace Place2 = 1 if E_Var2==1
replace Place2 = 2 if E_Var2==2
replace Place2 = 3 if E_Var2==3
replace Place2 = 4 if E_Var2==4
replace Place2 = 5 if E_Var2==5

*And the same for the next 3 Variables.

*Finally I tried to generate a new variable taking into account just one of the answers, this is a variable for Home, one for Library, one for Lunch, etc.

gen Library = 0
replace Library=1 if Place1== 1 | Place2 == 1 | Place3 == 1 | Place4 == 1 | Place5== 1

gen Home = 0
replace Home =1 if Place1== 2 | Place2 == 2 | Place3 == 2 | Place4 == 2 | Place5== 2

gen During_Lectures = 0
replace During_Lectures = 1 if Place1== 3 | Place2 == 3 | Place3 == 3 | Place4 == 3 | Place5== 3

*And so on...

The problem is that this code does not give me the correct result, Stata assigns different labels to "Home" for example, in each of the encoded variables.

I would really appreciate if you can help me with this.

Last edited by Jose Ortega; 25 Nov 2017, 10:06.
Tags: labels, multiple responses, split strings, strings

William Lisowski

Join Date: Dec 2014
Posts: 10150

25 Nov 2017, 10:41

Welcome to Statalist.

Here is some code that should start you in the right direction.

Code:

clear
input id str80 response
1 "Home, School"
2 "Home, School"
3 "School, Lunch, Home"
4 "Gym"
5 "Lunch, School, Gym"
6 "Library, School, Home, Gym, Lunch"
end
split response, generate(v) parse(, " ") trim
list, clean
reshape long v, i(id) j(num)
drop if missing(v)
drop num
list in 1/10, clean
generate one = 1
reshape wide one, i(id) j(v) string
list, clean
foreach v of varlist one* {
    replace `v' = 0 if `v'==.
    }
rename (one*) (*)
list, clean

And here is the output of the final list command.

Code:

       id   Gym   Home   Library   Lunch   School                            response  
  1.    1     0      1         0       0        1                        Home, School  
  2.    2     0      1         0       0        1                        Home, School  
  3.    3     0      1         0       1        1                 School, Lunch, Home  
  4.    4     1      0         0       0        0                                 Gym  
  5.    5     1      0         0       1        1                  Lunch, School, Gym  
  6.    6     1      1         1       1        1   Library, School, Home, Gym, Lunch

Comment

Dave Airey

Join Date: Apr 2014
Posts: 416

25 Nov 2017, 11:07

If you know the levels, you could also use grep:

Code:

clear
input id str100 var
1 "Home, School"
2 "Home, School"
3 "School, Lunch, Home"
4 "Gym"
5 "Lunch, School, Gym"
6 "Library, School, Home, Gym, Lunch"
end
compress
generate home = regexs(0) if regexm(var, "Home")
generate school = regexs(0) if regexm(var, "School")
generate lunch = regexs(0) if regexm(var, "Lunch")
generate gym = regexs(0) if regexm(var, "Gym")
generate library = regexs(0) if regexm(var, "Library")
list, clean



       id                                 var   home   school   lunch   gym   library  
  1.    1                        Home, School   Home   School                          
  2.    2                        Home, School   Home   School                          
  3.    3                 School, Lunch, Home   Home   School   Lunch                  
  4.    4                                 Gym                           Gym            
  5.    5                  Lunch, School, Gym          School   Lunch   Gym            
  6.    6   Library, School, Home, Gym, Lunch   Home   School   Lunch   Gym   Library

https://stats.idre.ucla.edu/stata/fa...r-expressions/

Last edited by Dave Airey; 25 Nov 2017, 11:18. Reason: added url for regex Stata help

Comment

Nick Cox

Join Date: Mar 2014

Posts: 36058
#4

25 Nov 2017, 12:30

strpos() will also work just as well here.
1 like
Comment
Jose Ortega

Join Date: Nov 2017

Posts: 3
#5

25 Nov 2017, 15:49

Thank you very much for the advice, your codes were very helpful.
Comment
Iffat Zahan

Join Date: Oct 2019

Posts: 1
#6

29 Oct 2019, 04:54

I have a similar yet different problem. I want to split the multiple response this way. However, for one response we did not get any observation as in for options 1 to 5, no one chose option 3. So while splitting such way, how to incorporate that option to be counted as a variable?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#7

29 Oct 2019, 06:04

Consider the code in #3. You can always add new variables, say

Code:

generate Nobel = regexs(0) if regexm(var, "Nobel Prize")

which will be always empty if (and only if) none of your respondents mentioned a Nobel Prize.
Comment

Tri Hoang

Join Date: Jul 2020
Posts: 1

28 Jul 2020, 02:28

Thanks William for your helpful advice. From your initial idea, I wrote a program, namely mutilresponse, to handle the problem with any parsing characters. Thank you once again!

Originally posted by William Lisowski View Post

Welcome to Statalist.

Here is some code that should start you in the right direction.

Code:

clear
input id str80 response
1 "Home, School"
2 "Home, School"
3 "School, Lunch, Home"
4 "Gym"
5 "Lunch, School, Gym"
6 "Library, School, Home, Gym, Lunch"
end
split response, generate(v) parse(, " ") trim
list, clean
reshape long v, i(id) j(num)
drop if missing(v)
drop num
list in 1/10, clean
generate one = 1
reshape wide one, i(id) j(v) string
list, clean
foreach v of varlist one* {
replace `v' = 0 if `v'==.
}
rename (one*) (*)
list, clean

And here is the output of the final list command.

Code:

id Gym Home Library Lunch School response
1. 1 0 1 0 0 1 Home, School
2. 2 0 1 0 0 1 Home, School
3. 3 0 1 0 1 1 School, Lunch, Home
4. 4 1 0 0 0 0 Gym
5. 5 1 0 0 1 1 Lunch, School, Gym
6. 6 1 1 1 1 1 Library, School, Home, Gym, Lunch

Comment

Fahad Mirza

Join Date: Sep 2018

Posts: 263
#9

05 Jun 2022, 11:43

I have a question regarding this. Imagine having a string variable containing the numbers and extended missing values: 1 2 3 4 5 6 7 8 9 10 11 12 4444 .n .a

I want to splied this string variable as shown in post 3 but all while matching the word exactly. However, when i run the code, the result do not appear as required. For example if a string row has the text "8 10 11" then it will make the variable having suffix _1 to also contain value where as it shouldnt. Example code below:

Code:

clear all input str14 benefits_ad ".n" ".n" "1 2 3 4 6 8 10" "4444" "1 3 6 8 10" "1 3" "1 3" "1 2 3 4 8 10" ".n" "8 10 11" end compress local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 n a" foreach i in `list' { generate benefit_`i' = regexs(0) if regexm(benefits_ad, "`i'") }

Result:

As seen in row 4 and 10, the result is appearing incorrect. How can I match the word properly in this instance.
Comment

Fahad Mirza

Join Date: Sep 2018
Posts: 263

#10

05 Jun 2022, 14:07

Also tried this variation but not sure why it keeps matching in this manner instead of comparing by word

Code:

local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 n a"
local n : word count `list'

forvalues i = 1/`n' {
    
    local a : word `i' of `list'
    generate benefit_`a' = "`a'" if word(benefits_ad, `i')  == "`a'"
    
}

Comment

Fahad Mirza

Join Date: Sep 2018
Posts: 263

#11

05 Jun 2022, 15:01

Came up with a not so elegant solution but here it is:

Code:

  
clear all

input str14 benefits_ad
".n"
".n"
"1 2 3 4 6 8 10"
"4444"
"1 3 6 8 10"
"1 3"
"1 3"
"1 2 3 4 8 10"
".n"
"8 10 11"
end

compress

local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 .n .a"

foreach i in `list' {
    local j : subinstr local i "." ""
    generate benefit_`j' = "`i'"
}

local N = _N
rename benefit_# test#, r dryrun
ds `r(oldnames)' benefit_n benefit_a
foreach var in `r(varlist)' {
forvalues i = 1/`N' {
    local s1 = benefits_ad[`i']
    local s2 = `var'[`i']        
    local intersection: list s1 & s2
    local n : word count `intersection'
    display `n'
    replace `var'= "" in `i' if `n' == 0
}
}

If anyone can help me improve this then that will be great!

Comment

Nick Cox

Join Date: Mar 2014
Posts: 36058

#12

05 Jun 2022, 15:04

I think you're confusing a loop over the words you are searching for -- which is needed -- with a loop over the words of each string value -- which isn't.

If something occurs as a word "foo" then you'll find " foo " -- except possibly at the beginning and end of a string value. So, the code for something that catches those cases too.

Code:

clear all

input str14 benefits_ad
".n"
".n"
"1 2 3 4 6 8 10"
"4444"
"1 3 6 8 10"
"1 3"
"1 3"
"1 2 3 4 8 10"
".n"
"8 10 11"
end

compress

local list "1 2 3 4 5 6 7 8 9 10 11 12 4444 .n .a"

foreach x of local list { 
    local new = strtoname("is`x'")
    gen `new' = "`x'" if strpos(" " + benefits_ad + " ", " `x' ") > 0 
}

list benefits_ad is1-is7 

list is8-isa 






. list benefits_ad is1-is7 

     +----------------------------------------------------------+
     |    benefits_ad   is1   is2   is3   is4   is5   is6   is7 |
     |----------------------------------------------------------|
  1. |             .n                                           |
  2. |             .n                                           |
  3. | 1 2 3 4 6 8 10     1     2     3     4           6       |
  4. |           4444                                           |
  5. |     1 3 6 8 10     1           3                 6       |
     |----------------------------------------------------------|
  6. |            1 3     1           3                         |
  7. |            1 3     1           3                         |
  8. |   1 2 3 4 8 10     1     2     3     4                   |
  9. |             .n                                           |
 10. |        8 10 11                                           |
     +----------------------------------------------------------+

. 
. list is8-isa 

     +------------------------------------------------------+
     | is8   is9   is10   is11   is12   is4444   is_n   isa |
     |------------------------------------------------------|
  1. |                                             .n       |
  2. |                                             .n       |
  3. |   8           10                                     |
  4. |                                    4444              |
  5. |   8           10                                     |
     |------------------------------------------------------|
  6. |                                                      |
  7. |                                                      |
  8. |   8           10                                     |
  9. |                                             .n       |
 10. |   8           10     11                              |
     +------------------------------------------------------+

Comment

Fahad Mirza

Join Date: Sep 2018

Posts: 263
#13

05 Jun 2022, 15:18

Nick Cox thank you so much! As pointed out, i was not looking at it properly. Was not aware I can match in this manner with strpos (mainly due to lack of understanding) but thank you for this wonderful example!
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#14

05 Jun 2022, 15:30

That's fine. Note that my code looks for a as a word but you might need to change that to .a
1 like
Comment

Announcement