Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coding an overall categorical variable from several mutually nonexclusive variables

    I'm currently working with a dataset of 233 observations and 300 variables. This data originates from a survey with a great deal of multiple choice questions, which is the reason why there are so relatively many variables. Each possible answer for each multiple choice questions appears as a binary variable in my dataset saying whether that answer was chosen or not. I have provided an example that I asses demonstrates the problem in a fair, although much smaller scaled, manner.

    My respondents have, in a multiple choice question, been asked what sodas they've consumed over the last year. For each soda they crossed off, they've been asked to which degree, on an ordinal scale from 1-5, they liked it and similarly how often they buy that soda.

    The data, therefore, looks something like this.

    Code:
    input byte(Cola Fanta Cola_Citrus) float(Like_Cola Like_Fanta Like_ColaCitrus Buy_Cola Buy_Fanta Buy_ColaCitrus)
    1 1 1 1 2 3 1 3 1
    . . 1 . . 4 . . 3
    1 . . 2 1 . 5 . .
    1 1 . 3 3 . 2 3 .
    1 . . 4 5 . 1 . .
    . . 1 . . 2 . . 1
    . 1 . . . . . 1 .
    . 1 1 . . 1 . 1 1
    . . . . . . . . .
    1 1 . 5 4 . 5 3 .
    end
    label values Cola Cola
    label def Cola 1 "Cola", modify
    label values Fanta Fanta
    label def Fanta 1 ".", modify
    label values Cola_Citrus Cola_Citrus
    label def Cola_Citrus 1 "Cola Citrus", modify
    label values Like_Cola ordinal
    label values Like_Fanta ordinal
    label values Like_ColaCitrus ordinal
    label values Buy_Cola ordinal
    label values Buy_Fanta ordinal
    label values Buy_ColaCitrus ordinal
    label def ordinal 1 "Often", modify
    label def ordinal 2 "Somehow Often", modify
    label def ordinal 3 "Neither", modify
    label def ordinal 4 "Rarely", modify
    label def ordinal 5 "Never", modify
    I'm aware that I can use the mrtab command in order to gain an overview of the answers, but I desire a new variable, as I wish to make a twoway table with other variables such as the "Like_var".

    Thus, I wish to make a overall "category" for sodas. This could be "Cola" and "Orange sodas". But the Cola Citrus would have to figure ind both (!). Now, I can code individual binary variables on whether you drink Cola or Orange.

    Code:
    gen Cola2 =.
    recode  Cola2 (.=1) if Cola==1 | Cola_Citrus ==1
    
    gen Orange=.
    recode vurdering (.=1) if Fanta==1 | Cola_Citrus==1
    
    *Attempting to make overall category variable
    gen Category_Var =.
    recode Category_Var (.=1) if Cola2==1
    recode kategorier (.=2) if Orange==1
    
    tab Category_var
    Upon tabulating the new category variable, and within the process of making it, there is no error message, but the second value of the variable, "orange", does not have all the observations that it should. I assume that each respondent can only appear once in each variable. And as you see several respondents enjoy sodas from both categories. Therefore, my question is:

    "How may I code a variable that allow for each respondent to figure more than once, or alternatively, how would you go about analyzing the question of whether there are difference between the categories in regards to how often they are consumed?"

    This is even more complicated by the fact that I am not allowed do use regression, but should stick to two, or perhaps, threeway tables.

    Additionally, how may I produce one collected "how often do you buy soda"-variable, when this too dispersed over three variables? (I also would want to analyse the relationship between how good a soda is rated and how often it is bought).

    Hopeful regards

  • #2
    http://www.stata-journal.com/sjpdf.h...iclenum=dm0034 is a generic discussion.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      I'll have a look - thanks a lot. Would you mind if posted follow up questions in this thread?

      Comment


      • #4
        Feel free to post or not.

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Feel free to post or not.
          So I've read the article you posted, Mr. Cox. I'm not sure that what the egen "var" = group (), label function does, is what I desire.

          Code:
           gen Cola2 =.
          (10 missing values generated)
          
          recode  Cola2 (.=1) if Cola==1 | Cola_Citrus ==1
          (Cola2: 8 changes made)
          
          gen Orange=.
          (10 missing values generated)
          
          recode Orange (.=1) if Fanta==1 | Cola_Citrus==1
          (Orange: 7 changes made)
          
          egen Category = group(Cola2 Orange), label
          (4 missing values generated)
          What I get when I tabulate the "category" var is this:
          Group (Cola2 Orange freq perceent cum
          1 1 6 100 100
          Total 6 100
          What I'm trying to achieve is more in the line of this:
          Category Freq Percent
          Cola2 obs %
          Orange obs %
          Total obs %
          And, hopefully be able to cross tabulate the category variable with another variable of the same sort.
          Additionally, Is there a way to prevent that it creates so many missing variables? I dont need the respondents to having answer all variables.
          Last edited by Mads Moring; 08 May 2017, 06:08.

          Comment


          • #6
            Your original post #1 suggests some problems here.

            1. Your variables such as Cola and Fanta are coded 1 and missing. Perhaps that should be 1 and 0. Stata will ignore missing values to the extent possible.

            2. Your code includes the line

            Code:
             
             label def Fanta 1 ".", modify
            and it's hard to see any reason for that or for treating Fanta differently.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Your original post #1 suggests some problems here.

              1. Your variables such as Cola and Fanta are coded 1 and missing. Perhaps that should be 1 and 0. Stata will ignore missing values to the extent possible.

              2. Your code includes the line

              Code:
              label def Fanta 1 ".", modify
              and it's hard to see any reason for that or for treating Fanta differently.
              The Fanta 1 "." is indeed a mistake. It was my first time creating data in the editor (and using the dataex software). Thanks! I've now changed all the variables, so that what was former coded as missing is now coded as 0.

              And I'm now very close to having the result that I intended.

              Code:
              recode Cola (.=0)
              recode Fanta (.=0)
              recode Cola_Citrus (.=0)
              recode Buy_Cola (.=0)
              recode Buy_Fanta (.=0)
              recode Buy_ColaCitrus (.=0)
              recode Like_Cola (.=0)
              recode Like_Fanta (.=0)
              recode Like_ColaCitrus (.=0)
              
              *
              gen Cola2 = 0
              recode  Cola2 (0=1) if Cola==1 | Cola_Citrus ==1
              gen Orange= 0
              recode Orange (0=1) if Fanta==1 | Cola_Citrus==1
              egen Category = group(Cola2 Orange), label
              tab Category
              This gives me something like the following:
              Categoy freq pecent valid cum
              00
              01
              10
              11
              Code:
              gen Category2 = 0
              recode Category2 (0=1) if Category == 2 |  Category == 4
              gen Category3 = 0
              recode Category3 (0=1) if Category == 3 |  Category == 4
              egen Last_Category = group (Category2 Category3), label
              tab Last_Category
              Does not give the desired outcome as explained in post#5

              Comment


              • #8
                Please give a complete script starting from your data example. I didn't understand #5, so I don't promise to answer, but people won't want to read the entire thread to understand.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  Please give a complete script starting from your data example. I didn't understand #5, so I don't promise to answer, but people won't want to read the entire thread to understand.
                  What I was trying to say in #5 was, that I wish to have a table, that does not state the freq. of respondents that have 1 in Orange, the freq. of respondents who have 1 ind Cola, and lastly the ones who have 1 in both. I want the ones who have 1 in both to appear in / count towards the freq. in both Orange and Cola.

                  Code:
                  input byte(Cola Fanta Cola_Citrus) float(Like_Cola Like_Fanta Like_ColaCitrus Buy_Cola Buy_Fanta Buy_ColaCitrus)
                  1 1 1 1 2 3 1 3 1
                  . . 1 . . 4 . . 3
                  1 . . 2 1 . 5 . .
                  1 1 . 3 3 . 2 3 .
                  1 . . 4 5 . 1 . .
                  . . 1 . . 2 . . 1
                  . 1 . . . . . 1 .
                  . 1 1 . . 1 . 1 1
                  . . . . . . . . .
                  1 1 . 5 4 . 5 3 .
                  end
                  label values Cola Cola
                  label def Cola 1 "Cola", modify
                  label values Fanta Fanta
                  label def Fanta 1 "Fanta", modify
                  label values Cola_Citrus Cola_Citrus
                  label def Cola_Citrus 1 "Cola Citrus", modify
                  label values Like_Cola ordinal
                  label values Like_Fanta ordinal
                  label values Like_ColaCitrus ordinal
                  label values Buy_Cola ordinal
                  label values Buy_Fanta ordinal
                  label values Buy_ColaCitrus ordinal
                  label def ordinal 1 "Often", modify
                  label def ordinal 2 "Somehow Often", modify
                  label def ordinal 3 "Neither", modify
                  label def ordinal 4 "Rarely", modify
                  label def ordinal 5 "Never", modify
                  
                  recode Cola (.=0)
                  recode Fanta (.=0)
                  recode Cola_Citrus (.=0)
                  recode Buy_Cola (.=0)
                  recode Buy_Fanta (.=0)
                  recode Buy_ColaCitrus (.=0)
                  recode Like_Cola (.=0)
                  recode Like_Fanta (.=0)
                  recode Like_ColaCitrus (.=0)
                  
                  *
                  gen Cola2 = 0
                  
                  
                  recode  Cola2 (0=1) if Cola==1 | Cola_Citrus ==1
                  
                  
                  gen Orange= 0
                  
                  
                  recode Orange (0=1) if Fanta==1 | Cola_Citrus==1
                  
                  egen Category = group(Cola2 Orange), label
                  tab Category
                  fre Category
                  
                  
                  gen Category2 = 0
                  recode Category2 (0=1) if Category == 2 |  Category == 4
                  gen Category3 = 0
                  recode Category3 (0=1) if Category == 3 |  Category == 4
                  egen Last_Category = group (Category2 Category3), label
                  tab Last_Category
                  The table at the end is therefore not fare away from the table that is wanted, but its not quite there either. I want it down to two possible results of the last_category var

                  Comment


                  • #10
                    Sorry, still missing what you want that's different. This is cleaner and I think equivalent code.


                    Code:
                    clear 
                    
                    input byte(Cola Fanta Cola_Citrus) float(Like_Cola Like_Fanta Like_ColaCitrus Buy_Cola Buy_Fanta Buy_ColaCitrus)
                    1 1 1 1 2 3 1 3 1
                    . . 1 . . 4 . . 3
                    1 . . 2 1 . 5 . .
                    1 1 . 3 3 . 2 3 .
                    1 . . 4 5 . 1 . .
                    . . 1 . . 2 . . 1
                    . 1 . . . . . 1 .
                    . 1 1 . . 1 . 1 1
                    . . . . . . . . .
                    1 1 . 5 4 . 5 3 .
                    end
                    label values Cola Cola
                    label def Cola 1 "Cola", modify
                    label values Fanta Fanta
                    label def Fanta 1 "Fanta", modify
                    label values Cola_Citrus Cola_Citrus
                    label def Cola_Citrus 1 "Cola Citrus", modify
                    label values Like_Cola ordinal
                    label values Like_Fanta ordinal
                    label values Like_ColaCitrus ordinal
                    label values Buy_Cola ordinal
                    label values Buy_Fanta ordinal
                    label values Buy_ColaCitrus ordinal
                    label def ordinal 1 "Often", modify
                    label def ordinal 2 "Somehow Often", modify
                    label def ordinal 3 "Neither", modify
                    label def ordinal 4 "Rarely", modify
                    label def ordinal 5 "Never", modify
                    
                    * some differences in what follows 
                    mvencode *, mv(0) 
                    
                    gen Cola2 = inlist(1, Cola, Cola_Citrus) 
                    gen Orange= inlist(1, Fanta, Cola_Citrus) 
                    
                    egen Category = group(Cola2 Orange), label
                    tab Category
                    
                    gen Category2 = inlist(Category, 2, 4) 
                    gen Category3 = inlist(Category, 3, 4) 
                    
                    egen Last_Category = group(Category2 Category3), label
                    tab Last_Category

                    Comment


                    • #11
                      Originally posted by Nick Cox View Post
                      Sorry, still missing what you want that's different. This is cleaner and I think equivalent code.
                      [/CODE]

                      I'll attempt to explain myself again, and thank you so much for you patience, Mr. Cox.

                      The table that the last command "tab Last_Category" (post#10) produces has four outcomes {00;01;10;11}. If I'm not mistaken "00" shows the freq. of people who are in neither category. The outcome "01" shows the freq. of people who are not in the first category, but the second and so forth. The outcome "11" then shows how many people are in both (?).

                      The table summarizes to 10, which is the total number of observations. Thus, these outcomes seems to be mutually exclusive. When looking in the data, we see that several respondents should be in both categories, and they are now in the "11" outcome. I do not mind counting every respondent several times in the table I desire, I actually want the categories not to be mutually exclusive. So what I would like is that the 6 respondents who are in the "11" outcome appears in both the "01" and the "10" outcomes. This would give Last_Category variable two outcomes "01"(Cola2) and "10" (Orange) with 8 and 7 in freq. respectively.

                      My question then, is how I get to that last table from the one we have now?

                      Comment


                      • #12
                        (On Statalist, there is no need to use titles such as Mr unless people insist on them, and I've never seen that in 23 years here. It's all too likely that people will get the title wrong in any case. For myself "Nick" is fine.)

                        Your categories have to be mutually exclusive with tabulate. No way round that. Otherwise I fear I still don't understand. Sorry.

                        Comment


                        • #13
                          Originally posted by Nick Cox View Post
                          Your categories have to be mutually exclusive with tabulate. No way round that. Otherwise I fear I still don't understand. Sorry.


                          I've found a way that I think works, even though it is not aesthetically pleasing.

                          Code:
                          clear all
                          input byte(Cola Fanta Cola_Citrus) float(Like_Cola Like_Fanta Like_ColaCitrus Buy_Cola Buy_Fanta Buy_ColaCitrus)
                          1 1 1 1 2 3 1 3 1
                          . . 1 . . 4 . . 3
                          1 . . 2 1 . 5 . .
                          1 1 . 3 3 . 2 3 .
                          1 . . 4 5 . 1 . .
                          . . 1 . . 2 . . 1
                          . 1 . . . . . 1 .
                          . 1 1 . . 1 . 1 1
                          . . . . . . . . .
                          1 1 . 5 4 . 5 3 .
                          end
                          label values Cola Cola
                          label def Cola 1 "Cola", modify
                          label values Fanta Fanta
                          label def Fanta 1 "Fanta", modify
                          label values Cola_Citrus Cola_Citrus
                          label def Cola_Citrus 1 "Cola Citrus", modify
                          label values Like_Cola ordinal
                          label values Like_Fanta ordinal
                          label values Like_ColaCitrus ordinal
                          label values Buy_Cola ordinal
                          label values Buy_Fanta ordinal
                          label values Buy_ColaCitrus ordinal
                          label def ordinal 1 "Often", modify
                          label def ordinal 2 "Somehow Often", modify
                          label def ordinal 3 "Neither", modify
                          label def ordinal 4 "Rarely", modify
                          label def ordinal 5 "Never", modify
                          
                          
                          cd "PUT IN YOUR OWN LOCATION"
                          save forsog0.dta, replace
                          
                          gen soda1=.
                          replace soda1=1 if Fanta==1
                          replace soda1=2 if Cola_Citrus==1 & soda1==.
                          replace soda1=3 if Cola==1 & soda1==.
                          
                          gen soda2=.
                          replace soda2=2 if Cola_Citrus==1 & soda1<2
                          replace soda2=3 if Cola==1 & soda1<3
                          
                          gen soda3=.
                          replace soda3=3 if Cola==1 & soda1<3 & soda2<3
                          
                          lab def SODA 1 "Fanta" 2 "Cola_Citrus" 3 "Cola"
                          lab val soda1 soda2 soda3 SODA
                          
                          *
                          
                          gen Category_Orange=1 if soda1==1| soda1==2
                          replace Category_Orange=2 if soda2==2
                          
                          gen Category_Cola=1 if soda1==2 | soda1==3
                          replace Category_Cola=2 if soda2==2 | soda2==3
                          replace Category_Cola=3 if soda3==3
                          
                          
                          *Saving Category_Orange
                          keep Category_Orange
                          
                          cd "PUT IN YOUR OWN LOCATION"
                          
                          save forsog1.dta, replace
                          
                          *Saving Category_Cola
                          use forsog0.dta, clear
                          gen soda1=.
                          replace soda1=1 if Fanta==1
                          replace soda1=2 if Cola_Citrus==1 & soda1==.
                          replace soda1=3 if Cola==1 & soda1==.
                          
                          gen soda2=.
                          replace soda2=2 if Cola_Citrus==1 & soda1<2
                          replace soda2=3 if Cola==1 & soda1<3
                          
                          gen soda3=.
                          replace soda3=3 if Cola==1 & soda1<3 & soda2<3
                          
                          lab def SODA 1 "Fanta" 2 "Cola_Citrus" 3 "Cola"
                          lab val soda1 soda2 soda3 SODA
                          
                          gen Category_Orange=1 if soda1==1| soda1==2
                          replace Category_Orange=2 if soda2==2
                          
                          gen Category_Cola=1 if soda1==2 | soda1==3
                          replace Category_Cola=2 if soda2==2 | soda2==3
                          replace Category_Cola=3 if soda3==3
                          
                          keep Category_Cola
                          
                          cd "PUT IN YOUR OWN LOCATION"
                          
                          save forsog2.dta, replace
                          
                          *
                          use "PUT IN YOUR OWN LOCATION \forsog1.dta" , clear
                          
                          append using "PUT IN YOUR OWN LOCATION\forsog2.dta"
                          
                          recode Category_Orange (2=1)
                          recode Category_Cola (1=2)
                          
                          gen CATEGORY=Category_Orange if Category_Orange!=.
                          replace CATEGORY=Category_Cola if Category_Cola!=.
                          lab def CAT_LAB 1 "Orange" 2 "Cola"
                          lab val CATEGORY CAT_LAB
                          
                          tab CATEGORY
                          Last edited by Mads Moring; 12 May 2017, 07:08.

                          Comment

                          Working...
                          X