Separating variable with multiple responses

Germain Lam

Join Date: Sep 2022
Posts: 9

Separating variable with multiple responses

01 Sep 2022, 09:20

Hello all,

Apologies if this is a very simple/basic query. I am very new to STATA

Having collected my data I have been trying to figure out how I can analyse a variable with multiple responses into something that can be used for analysis.

Originally it appeared like this when I first downloaded:

Study ID	lake_activity
1	1 2 3 4
2	3 2 1
3	2 3
4	3 2

Labelled

Study ID	lake_activity
1	Swimming (1) Bathing (2) Fishing (3) Washing clothes (4)
2	Fishing (3) Bathing (2) Swimming (1)
3	Bathing (2) Fishing (3)
4	Fishing (3) Bathing (2)

Using odkmeta I managed to separate these responses into 4 variables:

Study ID	lake_activity1	lake_activity2	lake_activity3	lake_activity4
1	Swimming (1)	Bathing (2)	Fishing (3)	Washing clothes (4)
2	Fishing (3)	Bathing (2)	Swimming (1)
3	Bathing (2)	Fishing (3)
4	Fishing (3)	Bathing (2)

However now I am at a loss at how I can analyse this meaningfully. i.e. I would like to know the frequency of how many people bathe in the lake (4 in this example)

I have tried to see if I can sort the responses by order in the original variable e.g.:

1. 1 2 3 4
2. 1 2 3
3. 2 3
4. 2 3

So that I can at least relabel 2 3 into "Bathing + Fishing" so that ID 3 and 4 have the same observation.

I have tried to use logistic regression to generate dummy variables but this has ended up with 56 new variables and as above, input for 3 and 4 are actually the same, just that they are ordered differently so appears different to stata

Does anyone have any ideas?

Tags: None

Ken Chui

Join Date: Aug 2014

Posts: 1063
#2

01 Sep 2022, 09:30

Welcome to Statalist!

This is actually super simple to fix. But the way this question was formatted makes it very hard to help. If you can spare 5-10 minutes, check out the FAQ (link at the top of this forum), and read section 12 on how to show a few sample cases in code form (not table) using a command called dataex. For example, the first case of mpg weight and make of the built-in data set auto is:

Code:

sysuse auto, clear dataex mpg weight make, count(5)

Go to the output screen, and paste the code below, like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(mpg weight) str18 make 22 2930 "AMC Concord" 17 3350 "AMC Pacer" 22 2640 "AMC Spirit" 20 3250 "Buick Century" 15 4080 "Buick Electra" end

Last edited by Ken Chui; 01 Sep 2022, 09:33.
1 like
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17851

01 Sep 2022, 09:36

Germain:
welcome to this forum.
Just to start off, you might be interested in:

Code:

. clear
. input byte(Study_ID lake_activity1 lake_activity2 lake_activity3 lake_activity4)

     Study_ID  lake_a~1  lake_a~2  lake_a~3  lake_a~4
  1. 
. 1 1 2 3 4
  2. 
. 2 3 2 1 .
  3. 
. 3 2 3 . .
  4. 
. 4 3 2 . . 
  5. 
. end

. tab lake_activity4

lake_activi |
        ty4 |      Freq.     Percent        Cum.
------------+-----------------------------------
          4 |          1      100.00      100.00
------------+-----------------------------------
      Total |          1      100.00

.

That said, for further advice you should be a tad more detailed about your research goal. Thanks.

Kind regards,
Carlo
(Stata 19.0)

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#4

01 Sep 2022, 09:40

As stated in #2, we really need a data example to be able to help you effectively.

That said, here is one way, assuming that lake_activity is originally a single column, and is thus a string variable with elements like "1 2 3 4", "3 2 1", etc on different rows.

Code:

clear input byte study_id str10 lake_activity 1 "1 2 3 4" 2 "3 2 1" 3 "2 3" 4 "3 2" end local labels `" "Swimming" "Bathing" "Fishing" "Washing clothes" "' replace lake_activity = " " + lake_activity + " " forval i = 1/4 { gen byte lake_activity_`i' = (strpos(lake_activity, " `i' ") > 0) local lab: word `i' of `labels' label var lake_activity_`i' "`lab'" }

This generates four variables, one for each activity, with each being a binary specifying whether that activity was performed by that ID.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 36053

01 Sep 2022, 09:47

I agree with the excellent advice in #2 and #3.

Here is one technique using tabm from tab_chi on SSC.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte study_id str12 lake1 str11 lake2 str12 lake3 str19 lake4
1 "Swimming (1)" "Bathing (2)" "Fishing (3)"  "Washing clothes (4)"
2 "Fishing (3)"  "Bathing (2)" "Swimming (1)" ""                   
3 "Bathing (2)"  "Fishing (3)" ""             ""                   
4 "Fishing (3)"  "Bathing (2)" ""             ""                   
end

. tabm lake*

           |                   values
  variable | Bathing..  Fishing..  Swimmin..  Washing.. |     Total
-----------+--------------------------------------------+----------
     lake1 |         1          2          1          0 |         4 
     lake2 |         3          1          0          0 |         4 
     lake3 |         0          1          1          0 |         2 
     lake4 |         0          0          0          1 |         1 
-----------+--------------------------------------------+----------
     Total |         4          4          2          1 |        11 

. tabm lake*, transpose

                    |                  variable
             values |     lake1      lake2      lake3      lake4 |     Total
--------------------+--------------------------------------------+----------
        Bathing (2) |         1          3          0          0 |         4 
        Fishing (3) |         2          1          1          0 |         4 
       Swimming (1) |         1          0          1          0 |         2 
Washing clothes (4) |         0          0          0          1 |         1 
--------------------+--------------------------------------------+----------
              Total |         4          4          2          1 |        11 

 tabm lake*, oneway

             values |      Freq.     Percent        Cum.
--------------------+-----------------------------------
        Bathing (2) |          4       36.36       36.36
        Fishing (3) |          4       36.36       72.73
       Swimming (1) |          2       18.18       90.91
Washing clothes (4) |          1        9.09      100.00
--------------------+-----------------------------------
              Total |         11      100.00



.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#6

01 Sep 2022, 09:58

Germain Lam This is an early lesson on how the absence of a data example makes people waste precious time, and leads to potentially unhelpful answers: a lose-lose situation.

Many people here are glad to help you, but you can see that #3, #4 and #5 have all made different assumptions about how your data is structured, and have thus provided very different solutions. And there's still a chance that none of them fix your issue, because your data could have a different structure from all these assumptions.

So please do give us a data example, using the command -dataex-. Thank you!
1 like
Comment

Germain Lam

Join Date: Sep 2022
Posts: 9

01 Sep 2022, 10:23

Originally posted by Ken Chui View Post

Welcome to Statalist!

This is actually super simple to fix. But the way this question was formatted makes it very hard to help. If you can spare 5-10 minutes, check out the FAQ (link at the top of this forum), and read section 12 on how to show a few sample cases in code form (not table) using a command called dataex. For example, the first case of mpg weight and make of the built-in data set auto is:

Code:

sysuse auto, clear
dataex mpg weight make, count(5)

Go to the output screen, and paste the code below, like this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int(mpg weight) str18 make
22 2930 "AMC Concord"
17 3350 "AMC Pacer"
22 2640 "AMC Spirit"
20 3250 "Buick Century"
15 4080 "Buick Electra"
end

Hi Ken

Just tried to use dataex not sure if this code is helpful at all??

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str12 lake_activity byte(lake_contact_type1 lake_contact_type2 lake_contact_type3 lake_contact_type4 lake_contact_type5 lake_contact_type6)
"5 2 . . . ." 5 2 . . . .
"1 2 5 . . ." 1 2 5 . . .
"1 2 4 . . ." 1 2 4 . . .
"1 2 4 5 . ." 1 2 4 5 . .
"2 . . . . ." 2 . . . . .
end
label values lake_contact_type1 lake_contact_type
label values lake_contact_type2 lake_contact_type
label values lake_contact_type3 lake_contact_type
label values lake_contact_type4 lake_contact_type
label values lake_contact_type5 lake_contact_type
label values lake_contact_type6 lake_contact_type
label def lake_contact_type 1 "Bathing", modify
label def lake_contact_type 2 "Collecting water", modify
label def lake_contact_type 5 "Washing clothes", modify
label def lake_contact_type 4 "Swimming/playing", modify

Comment

Ken Chui

Join Date: Aug 2014

Posts: 1063
#8

01 Sep 2022, 11:23

Code:

* The other way around, name the variable by activity: foreach x in 1 2 3 4 5{ egen act_`x' = anymatch(lake_contact_type*), value(`x') } rename act_1 bathing rename act_2 collect_water rename act_3 ??? rename act_4 swim_play rename act_5 wash_cloth

This form is usually easier to use. For example, it'd be easier as a set of independent variables in a regression model. You'll be able to tell which activity is associated with the outcome.
Comment
Germain Lam

Join Date: Sep 2022

Posts: 9
#9

01 Sep 2022, 11:40

Originally posted by Ken Chui View Post

Code:

* The other way around, name the variable by activity: foreach x in 1 2 3 4 5{ egen act_`x' = anymatch(lake_contact_type*), value(`x') } rename act_1 bathing rename act_2 collect_water rename act_3 ??? rename act_4 swim_play rename act_5 wash_cloth

This form is usually easier to use. For example, it'd be easier as a set of independent variables in a regression model. You'll be able to tell which activity is associated with the outcome.

Hi Ken

Thank you - Have read the FAQ but just to clarify - should I copy and paste the code into the Stata Command box? I tried copy and pasting some of the other codes in the replies to this thread but had variable success..

Thanks to everyone who has contributed also
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1063
#10

01 Sep 2022, 12:08

Originally posted by Germain Lam View Post

Hi Ken

Thank you - Have read the FAQ but just to clarify - should I copy and paste the code into the Stata Command box? I tried copy and pasting some of the other codes in the replies to this thread but had variable success..

Thanks to everyone who has contributed also

Use a "do-file". Which is a text file that allows users to type up all the analysis and submit it as a batch. If you wish to know more, on Stata in the command box, submit help gs, and the read about do-file (it should be chapter 13).
Comment
Germain Lam

Join Date: Sep 2022

Posts: 9
#11

02 Sep 2022, 04:22

Originally posted by Ken Chui View Post

Use a "do-file". Which is a text file that allows users to type up all the analysis and submit it as a batch. If you wish to know more, on Stata in the command box, submit help gs, and the read about do-file (it should be chapter 13).

Hi Ken,

Just tried the code this morning adding it onto my do-file and it has worked like a charm.

Once again, thank you so much for your time and advice :-)
1 like
Comment
Germain Lam

Join Date: Sep 2022

Posts: 9
#12

05 Sep 2022, 14:45

Hi All,

Time has come for me to do my univariate analysis and I am interested in looking at the association between lake activity and cca positivity.

My data currently looks like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(no_activity bathing collect_water fishing swim_play wash_cloth) 0 0 1 0 0 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 end

Using logistic regression I decided to use the following command to adjust for all the other activities so that I can explore if there is a particular activity that is more strongly associated with cca positivity:

logistic cca_positive i.no_activity i.bathing i.collect_water i.fishing i.swim_play i.wash_cloth, base

I'd like to do a likelihood ratio test, but cannot figure out the syntax to account for missing variables.

Would someone be able to advise how I would go about this and if I am on the right lines of analysis given my question: is a particular activity more strongly associated with cca positivity?

Furthermore, I'm not sure if this is correct as the participants often do several activities (e.g. fishing + washing + bathing) rather than just one variable. Is there another test I need to use in light of this?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17851
#13

06 Sep 2022, 00:15

Germain:
what is the regressand of your logistic regression? Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Germain Lam

Join Date: Sep 2022

Posts: 9
#14

06 Sep 2022, 01:53

Originally posted by Carlo Lazzaro View Post

Germain:
what is the regressand of your logistic regression? Thanks.

Hi Carlo, the regressand is cca positivity (yes or no) - binary.

However later I would also like to look at lake activity as a predictor of the strength of cca positivity (neg, trace, +, ++, +++) - from what I understand I should then use ordinal logistic regression for this.
Comment

Announcement

Separating variable with multiple responses

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment