Replace command with non-mutually exclusive categorical data

Michel Hauer

Join Date: Jun 2022

Posts: 7
#1

Replace command with non-mutually exclusive categorical data

30 Jun 2022, 14:45

Hello,

I am working with a dataset from a Twitter content analysis project and am stuck trying out figure out how to take 8 categorical tweet characteristic variables (resource, news, personal experience, personal opinion, marketing, spam, question, jokes/parody) and create one "tweet characteristic" variable (code below).

The problem I am having is that the categories are not mutually exclusive. The n for JokesParody is 22, but when I run this code it reduces it to 5 since a tweet can have several of these characteristics. Any help you can provide would be very much appreciated.

gen Characteristics=.
replace Characteristics = 0 if JokesParody==1
replace Characteristics = 1 if Resource==1
replace Characteristics = 2 if News==1
replace Characteristics = 3 if PersonalExperience==1
replace Characteristics = 4 if PersonalOpinion==1
replace Characteristics = 5 if Marketing==1
replace Characteristics = 6 if Spam==1
replace Characteristics = 7 if Question==1
label var Characteristics "Tweet Characteristics"
label define Characteristics 0 "Jokes/Parody" 1 "Resource" 2 "News" 3 "Personal Experience" 4 "Personal Opinion" 5 "Marketing" 6 "Spam" 7 "Question"
label val Characteristics Characteristics
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#2

30 Jun 2022, 14:47

the question is, for your purposes, what do you want the result to be when a tweet has more than 1 characteristic?
Comment
Michel Hauer

Join Date: Jun 2022

Posts: 7
#3

30 Jun 2022, 15:01

Thanks for the response, Rich. If a tweet has more than 1 characteristic I want it to appear more than once. I want to have totals for each characteristic in a single variable
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#4

30 Jun 2022, 15:41

sorry, still not clear to me - let's try this: do you want 8 yes/no variables which would call for you to use multiple response type analysis or do you want one variable with, possibly, dozens of different distinct responses?
Comment
Michel Hauer

Join Date: Jun 2022

Posts: 7
#5

30 Jun 2022, 16:17

Each of the 8 variables are currently formatted as yes/no variables but a tweet can contain multiple characteristics. I think multiple response analysis sounds like the way to go.

I just want a variable that has the number of times each characteristic was selected, but, the way I had it coded I can’t do that. JokesParody was selected 22 times, but because another characteristic was selected 17 times, when I use the code above it puts 17 of those observations into other categories.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4459
#6

30 Jun 2022, 17:34

unfortunately, I am not very familiar with this but you might want to look at -mrtab- (user-written; use -search- to locate and download); I think that other user-written packages for multiple responses also exist and have been discussed on this forum so you might want to search the forum
Comment
Michel Hauer

Join Date: Jun 2022

Posts: 7
#7

30 Jun 2022, 17:54

Ok, I’ll give that a shot. Thank you, Rich
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10188

30 Jun 2022, 18:19

Originally posted by Michel Hauer View Post

Hello,

I am working with a dataset from a Twitter content analysis project and am stuck trying out figure out how to take 8 categorical tweet characteristic variables (resource, news, personal experience, personal opinion, marketing, spam, question, jokes/parody) and create one "tweet characteristic" variable (code below).

What do you need this variable for? Just to obtain a tabulation of totals?

Code:

set obs 100
set seed 01072022
local i 1
foreach var in JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question{
    gen `var'= rnormal()>0.`i'
    local ++i
}
*START HERE
order JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question
rename JokesParody-Question var=
gen obs_no=_n
reshape long var, i(obs_no) j(which) string
contract which var if var
drop var
replace which= ustrregexra(which, "([a-z])([A-Z])", "$1 $2")
l, sep(0)

Res.:

Code:

. l, sep(0)

     +-----------------------------+
     |               which   _freq |
     |-----------------------------|
  1. |        Jokes Parody      38 |
  2. |           Marketing      25 |
  3. |                News      41 |
  4. | Personal Experience      43 |
  5. |    Personal Opinion      27 |
  6. |            Question      18 |
  7. |            Resource      42 |
  8. |                Spam      27 |
     +-----------------------------+

Last edited by Andrew Musau; 30 Jun 2022, 18:23.

Comment

Michel Hauer

Join Date: Jun 2022

Posts: 7
#9

30 Jun 2022, 19:36

Ideally I want to create a similar variable for six other categorical variables as well and do cross tabs and look for significance across them...I appreciate this code, but to just get tabulations I could just run tabulations on the binary (yes/no) variables
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

30 Jun 2022, 19:51

So you want to take a set of 8 binary variables and turn it into a categorical variable with up to 2⁸ = 256 values, and then repeat that for 6 more sets of binary variables?

I'd suggest something like the following (untested) code

Code:

generate double category = 0 foreach var in JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question { replace category = category*10 + `var' } format category %08.0f

which creates an 8-digit variable of 0's and 1's; the leftmost digit will be JokesParody and the rightmost will be Question. The %08.0f format causes leftmost zeroes to be displayed.

This will work for up to 16 binary variables. If you have no more than 10 binary variables comprising your category variables, you can substitute long for double as the storage type.
Comment
Daniel Schaefer

Join Date: Mar 2020

Posts: 814
#11

30 Jun 2022, 20:32

Originally posted by William Lisowski View Post

So you want to take a set of 8 binary variables and turn it into a categorical variable with up to 2⁸ = 256 values, and then repeat that for 6 more sets of binary variables?

I'd suggest something like the following (untested) code

Code:

generate double category = 0 foreach var in JokesParody Resource News PersonalExperience PersonalOpinion Marketing Spam Question { replace category = category*10 + `var' } format category %08.0f

which creates an 8-digit variable of 0's and 1's; the leftmost digit will be JokesParody and the rightmost will be Question. The %08.0f format causes leftmost zeroes to be displayed.

This will work for up to 16 binary variables. If you have no more than 10 binary variables comprising your category variables, you can substitute long for double as the storage type.

This strikes me as the most straightforward solution. Good luck OP! Please be sure to report back as per the FAQ.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35656

#12

01 Jul 2022, 05:20

I think @William Lisowski's nice idea could also be done as a string operation.. For that no loop is needed, as the main operation is string concatenation.

Code:

clear
input test1 test2 test3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
end  

egen wanted = concat(test*)

     +--------------------------------+
     | test1   test2   test3   wanted |
     |--------------------------------|
  1. |     0       0       0      000 |
  2. |     0       0       1      001 |
  3. |     0       1       0      010 |
  4. |     0       1       1      011 |
     |--------------------------------|
  5. |     1       0       0      100 |
  6. |     1       0       1      101 |
  7. |     1       1       0      110 |
  8. |     1       1       1      111 |
     +--------------------------------+

Last edited by Nick Cox; 01 Jul 2022, 05:22.

Comment

Michel Hauer

Join Date: Jun 2022

Posts: 7
#13

01 Jul 2022, 12:06

Thank you all, something came up today that I had to deal with so I'm unable to try this now but will report back when I can.
Comment
Michel Hauer

Join Date: Jun 2022

Posts: 7
#14

06 Jul 2022, 06:33

Hi all, I realize what I am trying to do is not going to work (not a problem with anyone's code, it's more of a problem with the Nvivo output and the result of the way I coded in Nvivo). Thanks for the support!
1 like
Comment

Announcement