Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating variable with multiple responses

    Hello all,

    Apologies if this is a very simple/basic query. I am very new to STATA

    Having collected my data I have been trying to figure out how I can analyse a variable with multiple responses into something that can be used for analysis.

    Originally it appeared like this when I first downloaded:
    Study ID lake_activity
    1 1 2 3 4
    2 3 2 1
    3 2 3
    4 3 2
    Labelled
    Study ID lake_activity
    1 Swimming (1) Bathing (2) Fishing (3) Washing clothes (4)
    2 Fishing (3) Bathing (2) Swimming (1)
    3 Bathing (2) Fishing (3)
    4 Fishing (3) Bathing (2)
    Using odkmeta I managed to separate these responses into 4 variables:
    Study ID lake_activity1 lake_activity2 lake_activity3 lake_activity4
    1 Swimming (1) Bathing (2) Fishing (3) Washing clothes (4)
    2 Fishing (3) Bathing (2) Swimming (1)
    3 Bathing (2) Fishing (3)
    4 Fishing (3) Bathing (2)
    However now I am at a loss at how I can analyse this meaningfully. i.e. I would like to know the frequency of how many people bathe in the lake (4 in this example)

    I have tried to see if I can sort the responses by order in the original variable e.g.:

    1. 1 2 3 4
    2. 1 2 3
    3. 2 3
    4. 2 3

    So that I can at least relabel 2 3 into "Bathing + Fishing" so that ID 3 and 4 have the same observation.

    I have tried to use logistic regression to generate dummy variables but this has ended up with 56 new variables and as above, input for 3 and 4 are actually the same, just that they are ordered differently so appears different to stata

    Does anyone have any ideas?

  • #2
    Welcome to Statalist!

    This is actually super simple to fix. But the way this question was formatted makes it very hard to help. If you can spare 5-10 minutes, check out the FAQ (link at the top of this forum), and read section 12 on how to show a few sample cases in code form (not table) using a command called dataex. For example, the first case of mpg weight and make of the built-in data set auto is:

    Code:
    sysuse auto, clear
    dataex mpg weight make, count(5)
    Go to the output screen, and paste the code below, like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(mpg weight) str18 make
    22 2930 "AMC Concord"  
    17 3350 "AMC Pacer"    
    22 2640 "AMC Spirit"  
    20 3250 "Buick Century"
    15 4080 "Buick Electra"
    end
    Last edited by Ken Chui; 01 Sep 2022, 09:33.

    Comment


    • #3
      Germain:
      welcome to this forum.
      Just to start off, you might be interested in:
      Code:
      . clear
      . input byte(Study_ID lake_activity1 lake_activity2 lake_activity3 lake_activity4)
      
           Study_ID  lake_a~1  lake_a~2  lake_a~3  lake_a~4
        1. 
      . 1 1 2 3 4
        2. 
      . 2 3 2 1 .
        3. 
      . 3 2 3 . .
        4. 
      . 4 3 2 . . 
        5. 
      . end
      
      . tab lake_activity4
      
      lake_activi |
              ty4 |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                4 |          1      100.00      100.00
      ------------+-----------------------------------
            Total |          1      100.00
      
      .
      That said, for further advice you should be a tad more detailed about your research goal. Thanks.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        As stated in #2, we really need a data example to be able to help you effectively.

        That said, here is one way, assuming that lake_activity is originally a single column, and is thus a string variable with elements like "1 2 3 4", "3 2 1", etc on different rows.

        Code:
        clear
        input byte study_id    str10 lake_activity
        1   "1 2 3 4"
        2    "3 2 1"
        3    "2 3"
        4    "3 2"
        end
        
        local labels `" "Swimming" "Bathing" "Fishing" "Washing clothes" "'
        
        replace lake_activity = " " + lake_activity + " "
        forval i = 1/4 {
            gen byte lake_activity_`i' = (strpos(lake_activity, " `i' ") > 0)
            local lab: word `i' of `labels'
            label var lake_activity_`i' "`lab'"
        }


        This generates four variables, one for each activity, with each being a binary specifying whether that activity was performed by that ID.

        Comment


        • #5
          I agree with the excellent advice in #2 and #3.

          Here is one technique using tabm from tab_chi on SSC.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte study_id str12 lake1 str11 lake2 str12 lake3 str19 lake4
          1 "Swimming (1)" "Bathing (2)" "Fishing (3)"  "Washing clothes (4)"
          2 "Fishing (3)"  "Bathing (2)" "Swimming (1)" ""                   
          3 "Bathing (2)"  "Fishing (3)" ""             ""                   
          4 "Fishing (3)"  "Bathing (2)" ""             ""                   
          end
          
          . tabm lake*
          
                     |                   values
            variable | Bathing..  Fishing..  Swimmin..  Washing.. |     Total
          -----------+--------------------------------------------+----------
               lake1 |         1          2          1          0 |         4 
               lake2 |         3          1          0          0 |         4 
               lake3 |         0          1          1          0 |         2 
               lake4 |         0          0          0          1 |         1 
          -----------+--------------------------------------------+----------
               Total |         4          4          2          1 |        11 
          
          . tabm lake*, transpose
          
                              |                  variable
                       values |     lake1      lake2      lake3      lake4 |     Total
          --------------------+--------------------------------------------+----------
                  Bathing (2) |         1          3          0          0 |         4 
                  Fishing (3) |         2          1          1          0 |         4 
                 Swimming (1) |         1          0          1          0 |         2 
          Washing clothes (4) |         0          0          0          1 |         1 
          --------------------+--------------------------------------------+----------
                        Total |         4          4          2          1 |        11 
          
           tabm lake*, oneway
          
                       values |      Freq.     Percent        Cum.
          --------------------+-----------------------------------
                  Bathing (2) |          4       36.36       36.36
                  Fishing (3) |          4       36.36       72.73
                 Swimming (1) |          2       18.18       90.91
          Washing clothes (4) |          1        9.09      100.00
          --------------------+-----------------------------------
                        Total |         11      100.00
          
          
          
          .

          Comment


          • #6
            Germain Lam This is an early lesson on how the absence of a data example makes people waste precious time, and leads to potentially unhelpful answers: a lose-lose situation.

            Many people here are glad to help you, but you can see that #3, #4 and #5 have all made different assumptions about how your data is structured, and have thus provided very different solutions. And there's still a chance that none of them fix your issue, because your data could have a different structure from all these assumptions.

            So please do give us a data example, using the command -dataex-. Thank you!

            Comment


            • #7
              Originally posted by Ken Chui View Post
              Welcome to Statalist!

              This is actually super simple to fix. But the way this question was formatted makes it very hard to help. If you can spare 5-10 minutes, check out the FAQ (link at the top of this forum), and read section 12 on how to show a few sample cases in code form (not table) using a command called dataex. For example, the first case of mpg weight and make of the built-in data set auto is:

              Code:
              sysuse auto, clear
              dataex mpg weight make, count(5)
              Go to the output screen, and paste the code below, like this:

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input int(mpg weight) str18 make
              22 2930 "AMC Concord"
              17 3350 "AMC Pacer"
              22 2640 "AMC Spirit"
              20 3250 "Buick Century"
              15 4080 "Buick Electra"
              end
              Hi Ken

              Just tried to use dataex not sure if this code is helpful at all??


              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input str12 lake_activity byte(lake_contact_type1 lake_contact_type2 lake_contact_type3 lake_contact_type4 lake_contact_type5 lake_contact_type6)
              "5 2 . . . ." 5 2 . . . .
              "1 2 5 . . ." 1 2 5 . . .
              "1 2 4 . . ." 1 2 4 . . .
              "1 2 4 5 . ." 1 2 4 5 . .
              "2 . . . . ." 2 . . . . .
              end
              label values lake_contact_type1 lake_contact_type
              label values lake_contact_type2 lake_contact_type
              label values lake_contact_type3 lake_contact_type
              label values lake_contact_type4 lake_contact_type
              label values lake_contact_type5 lake_contact_type
              label values lake_contact_type6 lake_contact_type
              label def lake_contact_type 1 "Bathing", modify
              label def lake_contact_type 2 "Collecting water", modify
              label def lake_contact_type 5 "Washing clothes", modify
              label def lake_contact_type 4 "Swimming/playing", modify

              Comment


              • #8
                Code:
                * The other way around, name the variable by activity:
                foreach x in 1 2 3 4 5{
                    egen act_`x' = anymatch(lake_contact_type*), value(`x')
                }
                rename act_1 bathing
                rename act_2 collect_water
                rename act_3 ???
                rename act_4 swim_play
                rename act_5 wash_cloth
                This form is usually easier to use. For example, it'd be easier as a set of independent variables in a regression model. You'll be able to tell which activity is associated with the outcome.

                Comment


                • #9
                  Originally posted by Ken Chui View Post
                  Code:
                  * The other way around, name the variable by activity:
                  foreach x in 1 2 3 4 5{
                  egen act_`x' = anymatch(lake_contact_type*), value(`x')
                  }
                  rename act_1 bathing
                  rename act_2 collect_water
                  rename act_3 ???
                  rename act_4 swim_play
                  rename act_5 wash_cloth
                  This form is usually easier to use. For example, it'd be easier as a set of independent variables in a regression model. You'll be able to tell which activity is associated with the outcome.
                  Hi Ken

                  Thank you - Have read the FAQ but just to clarify - should I copy and paste the code into the Stata Command box? I tried copy and pasting some of the other codes in the replies to this thread but had variable success..


                  Thanks to everyone who has contributed also

                  Comment


                  • #10
                    Originally posted by Germain Lam View Post

                    Hi Ken

                    Thank you - Have read the FAQ but just to clarify - should I copy and paste the code into the Stata Command box? I tried copy and pasting some of the other codes in the replies to this thread but had variable success..


                    Thanks to everyone who has contributed also
                    Use a "do-file". Which is a text file that allows users to type up all the analysis and submit it as a batch. If you wish to know more, on Stata in the command box, submit help gs, and the read about do-file (it should be chapter 13).

                    Comment


                    • #11
                      Originally posted by Ken Chui View Post

                      Use a "do-file". Which is a text file that allows users to type up all the analysis and submit it as a batch. If you wish to know more, on Stata in the command box, submit help gs, and the read about do-file (it should be chapter 13).
                      Hi Ken,

                      Just tried the code this morning adding it onto my do-file and it has worked like a charm.

                      Once again, thank you so much for your time and advice :-)

                      Comment


                      • #12
                        Hi All,

                        Time has come for me to do my univariate analysis and I am interested in looking at the association between lake activity and cca positivity.

                        My data currently looks like this:


                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input byte(no_activity bathing collect_water fishing swim_play wash_cloth)
                        0 0 1 0 0 1
                        0 1 1 0 0 1
                        0 1 1 0 1 0
                        0 1 1 0 1 1
                        0 0 1 0 0 0
                        0 1 1 0 0 1
                        0 0 1 0 0 1
                        0 0 1 0 0 1
                        0 1 0 0 1 0
                        0 0 1 0 0 1
                        end

                        Using logistic regression I decided to use the following command to adjust for all the other activities so that I can explore if there is a particular activity that is more strongly associated with cca positivity:


                        logistic cca_positive i.no_activity i.bathing i.collect_water i.fishing i.swim_play i.wash_cloth, base


                        I'd like to do a likelihood ratio test, but cannot figure out the syntax to account for missing variables.

                        Would someone be able to advise how I would go about this and if I am on the right lines of analysis given my question: is a particular activity more strongly associated with cca positivity?

                        Furthermore, I'm not sure if this is correct as the participants often do several activities (e.g. fishing + washing + bathing) rather than just one variable. Is there another test I need to use in light of this?





                        Comment


                        • #13
                          Germain:
                          what is the regressand of your logistic regression? Thanks.
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            Germain:
                            what is the regressand of your logistic regression? Thanks.
                            Hi Carlo, the regressand is cca positivity (yes or no) - binary.

                            However later I would also like to look at lake activity as a predictor of the strength of cca positivity (neg, trace, +, ++, +++) - from what I understand I should then use ordinal logistic regression for this.

                            Comment

                            Working...
                            X