Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replace command with non-mutually exclusive categorical data

    Hello,

    I am working with a dataset from a Twitter content analysis project and am stuck trying out figure out how to take 8 categorical tweet characteristic variables (resource, news, personal experience, personal opinion, marketing, spam, question, jokes/parody) and create one "tweet characteristic" variable (code below).

    The problem I am having is that the categories are not mutually exclusive. The n for JokesParody is 22, but when I run this code it reduces it to 5 since a tweet can have several of these characteristics. Any help you can provide would be very much appreciated.

    gen Characteristics=.
    replace Characteristics = 0 if JokesParody==1
    replace Characteristics = 1 if Resource==1
    replace Characteristics = 2 if News==1
    replace Characteristics = 3 if PersonalExperience==1
    replace Characteristics = 4 if PersonalOpinion==1
    replace Characteristics = 5 if Marketing==1
    replace Characteristics = 6 if Spam==1
    replace Characteristics = 7 if Question==1
    label var Characteristics "Tweet Characteristics"
    label define Characteristics 0 "Jokes/Parody" 1 "Resource" 2 "News" 3 "Personal Experience" 4 "Personal Opinion" 5 "Marketing" 6 "Spam" 7 "Question"
    label val Characteristics Characteristics

  • #2
    the question is, for your purposes, what do you want the result to be when a tweet has more than 1 characteristic?

    Comment


    • #3
      Thanks for the response, Rich. If a tweet has more than 1 characteristic I want it to appear more than once. I want to have totals for each characteristic in a single variable

      Comment


      • #4
        sorry, still not clear to me - let's try this: do you want 8 yes/no variables which would call for you to use multiple response type analysis or do you want one variable with, possibly, dozens of different distinct responses?

        Comment


        • #5
          Each of the 8 variables are currently formatted as yes/no variables but a tweet can contain multiple characteristics. I think multiple response analysis sounds like the way to go.

          I just want a variable that has the number of times each characteristic was selected, but, the way I had it coded I can’t do that. JokesParody was selected 22 times, but because another characteristic was selected 17 times, when I use the code above it puts 17 of those observations into other categories.

          Comment


          • #6
            unfortunately, I am not very familiar with this but you might want to look at -mrtab- (user-written; use -search- to locate and download); I think that other user-written packages for multiple responses also exist and have been discussed on this forum so you might want to search the forum

            Comment


            • #7
              Ok, I’ll give that a shot. Thank you, Rich

              Comment


              • #8
                Originally posted by Michel Hauer View Post
                Hello,

                I am working with a dataset from a Twitter content analysis project and am stuck trying out figure out how to take 8 categorical tweet characteristic variables (resource, news, personal experience, personal opinion, marketing, spam, question, jokes/parody) and create one "tweet characteristic" variable (code below).
                What do you need this variable for? Just to obtain a tabulation of totals?

                Code:
                set obs 100
                set seed 01072022
                local i 1
                foreach var in JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question{
                    gen `var'= rnormal()>0.`i'
                    local ++i
                }
                *START HERE
                order JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question
                rename JokesParody-Question var=
                gen obs_no=_n
                reshape long var, i(obs_no) j(which) string
                contract which var if var
                drop var
                replace which= ustrregexra(which, "([a-z])([A-Z])", "$1 $2")
                l, sep(0)
                Res.:

                Code:
                . l, sep(0)
                
                     +-----------------------------+
                     |               which   _freq |
                     |-----------------------------|
                  1. |        Jokes Parody      38 |
                  2. |           Marketing      25 |
                  3. |                News      41 |
                  4. | Personal Experience      43 |
                  5. |    Personal Opinion      27 |
                  6. |            Question      18 |
                  7. |            Resource      42 |
                  8. |                Spam      27 |
                     +-----------------------------+
                Last edited by Andrew Musau; 30 Jun 2022, 18:23.

                Comment


                • #9
                  Ideally I want to create a similar variable for six other categorical variables as well and do cross tabs and look for significance across them...I appreciate this code, but to just get tabulations I could just run tabulations on the binary (yes/no) variables

                  Comment


                  • #10
                    So you want to take a set of 8 binary variables and turn it into a categorical variable with up to 28 = 256 values, and then repeat that for 6 more sets of binary variables?

                    I'd suggest something like the following (untested) code
                    Code:
                    generate double category = 0
                    foreach var in JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question {
                        replace category = category*10 + `var'
                    }
                    format category %08.0f
                    which creates an 8-digit variable of 0's and 1's; the leftmost digit will be JokesParody and the rightmost will be Question. The %08.0f format causes leftmost zeroes to be displayed.

                    This will work for up to 16 binary variables. If you have no more than 10 binary variables comprising your category variables, you can substitute long for double as the storage type.

                    Comment


                    • #11
                      Originally posted by William Lisowski View Post
                      So you want to take a set of 8 binary variables and turn it into a categorical variable with up to 28 = 256 values, and then repeat that for 6 more sets of binary variables?

                      I'd suggest something like the following (untested) code
                      Code:
                      generate double category = 0
                      foreach var in JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question {
                      replace category = category*10 + `var'
                      }
                      format category %08.0f
                      which creates an 8-digit variable of 0's and 1's; the leftmost digit will be JokesParody and the rightmost will be Question. The %08.0f format causes leftmost zeroes to be displayed.

                      This will work for up to 16 binary variables. If you have no more than 10 binary variables comprising your category variables, you can substitute long for double as the storage type.
                      This strikes me as the most straightforward solution. Good luck OP! Please be sure to report back as per the FAQ.

                      Comment


                      • #12
                        I think @William Lisowski's nice idea could also be done as a string operation.. For that no loop is needed, as the main operation is string concatenation.


                        Code:
                        clear
                        input test1 test2 test3
                        0 0 0
                        0 0 1
                        0 1 0
                        0 1 1
                        1 0 0
                        1 0 1
                        1 1 0
                        1 1 1
                        end  
                        
                        egen wanted = concat(test*)
                        
                             +--------------------------------+
                             | test1   test2   test3   wanted |
                             |--------------------------------|
                          1. |     0       0       0      000 |
                          2. |     0       0       1      001 |
                          3. |     0       1       0      010 |
                          4. |     0       1       1      011 |
                             |--------------------------------|
                          5. |     1       0       0      100 |
                          6. |     1       0       1      101 |
                          7. |     1       1       0      110 |
                          8. |     1       1       1      111 |
                             +--------------------------------+
                        Last edited by Nick Cox; 01 Jul 2022, 05:22.

                        Comment


                        • #13
                          Thank you all, something came up today that I had to deal with so I'm unable to try this now but will report back when I can.

                          Comment


                          • #14
                            Hi all, I realize what I am trying to do is not going to work (not a problem with anyone's code, it's more of a problem with the Nvivo output and the result of the way I coded in Nvivo). Thanks for the support!

                            Comment

                            Working...
                            X