How to model non-mutually exclusive categorical variables?

togarra

Join Date: May 2014

Posts: 5
#1

How to model non-mutually exclusive categorical variables?

06 May 2014, 10:12

Hello,

I was wondering if there is a command in Stata to model multiple (non-mutually exclusive) categorical choices? I have read the various sections on nested, conditional & multinomial logits & probits, and read through the chapters on generalized structural equation modelling, but can't seem to find an answer to my question. My statistics knowledge is very basic, so perhaps the answer is staring me in the face and I am not recognizing it. However, I would be hugely grateful for some pointers!

My data consists of respondent choices between 4 non-mutually exclusive donation options, such that respondents have chosen to donate to either 1, 2, 3 or even all 4 of the options.

Thanks in advance for your help!

Tanya
Tags: categorical
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

06 May 2014, 10:54

I think you need to say more about your data and what you are trying to learn from it. Are these choices your outcome variable, or are they predictors/covariates you need to adjust for? How do the four choices relate to each other? There are 2^4 possible combinations of responses here. Are all 16 of them truly distinct? Or is there a way to reduce that to a smaller number of groups of categories that are meaningfully homogeneous? Etc. etc. It may be as simple as having four separate indicator variables, or a single variable indexing 16 combinations, or something else. Without knowing what the data mean and what the research question is, it is impossible to be more specific.
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#3

06 May 2014, 11:41

is not choosing any of the 4 a possible option? if yes, there are, as Clyde says, 16 possibilities; if no, there are 15; you can write all these out and count how often each occurs (you don't say what form the data are currently in, but, making a guess here, you can use -egen- with the group function to make a new variable with the various possibilities that exist in your data and then just use -tab- to get frequencies; you can then either use, e.g., -mlogit- or -ologit- if there is an ordering, to estimate models (possibly after collapsing some of the categories
1 like
Comment
togarra

Join Date: May 2014

Posts: 5
#4

06 May 2014, 13:19

thanks so much for your replies!

In response to Clyde: the choices I mean to model are indeed the dependent variable. Specifically (but briefly), the data consists of survey data in which respondents are asked to allocate monetary contributions among one or more different sectoral programs (1. Nature & Environment, 2. Agriculture, 3. Health, 4. Built Environment). Respondents can donate to any number of these. At this stage I am only interested in the participation decision for each program (donate/don't donate).

The truth is I was hoping there was some way to avoid estimating a model using all 16 possible outcomes that result from combining all options. I am not sure that these can really be collapsed into fewer categories. But I will run some models using these 16 outcomes as suggested and see what it looks like..

Thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

06 May 2014, 13:40

Well, if your interest at this stage is simply the participation decision for each program, then you can get that by setting up four separate dichotomous variables and just running four separate models, no? If you're also interested in whether decision to participate in the other programs affects participation in a particular one, the model for that program can include the dichotomous variables for the other three as predictors. And if at some point you want to do things like compare how some predictor(s) work across the different programs, you can store the estimates from each model and then run -suest-.
1 like
Comment
Phil Schumm

Join Date: Mar 2014

Posts: 169
#6

06 May 2014, 13:46

Originally posted by togarra View Post

My data consists of respondent choices between 4 non-mutually exclusive donation options, such that respondents have chosen to donate to either 1, 2, 3 or even all 4 of the options.

One option is to think of this as a multivariate binary outcome (i.e., donate to option 1 (yes/no), donate to option 2 (yes/no), etc.). You can then model the outcomes simultaneously, together with the association between them. A standard way to do this is with alternating logistic regression (Carey, Zeger and Diggle, Biometrika, 1993, 80(3), 517–26), however unfortunately I am not aware of any program current available to do this in Stata. Alternatively, you might consider a multivariate probit model, which may be fit in Stata with mvprobit (type search mvprobit). For details, see Cappellari and Jenkins, The Stata Journal, 2003, 3(3), 278–94.
1 like
Comment
togarra

Join Date: May 2014

Posts: 5
#7

06 May 2014, 14:23

thanks again.

Clyde - I have estimated the decisions individually (and this is kind of my 'baseline' model(s)) - however, I don't believe these models accurately reflect the decision-making process that respondents engage in. I think it is very likely that the choices are co-dependent and that the independent variables have different influences depending on both the total number of programs that respondents selects as well as the precise programs chosen. However, I hadn't thought of using the -suest command to compare influences - so thanks for that!

Phil - in fact the very first thing I read with regards to my question was about multivariate probit models, and for some reason I concluded that this was not appropriate. I will go back and re-read! As I said in my first post, chances are the answer has been staring me in the face..
Comment
Tani Akinbode

Join Date: Feb 2019

Posts: 6
#8

15 Feb 2019, 19:10

Hi. Please I have the same problem. Although I am just trying to create the variable to perform the descriptive analysis. I have two dichotomous variables. sexual abuse (0-no, 1=yes) physical abuse (0=no, 1=yes). I need my variable to become 0-no abuse, 1 sex abuse only, 2 physical abuse only and 4 both abuse. They are not mutually exclusive so the row total command doesn't work for me. Appreciate any help. Thanks.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4462
#9

15 Feb 2019, 19:14

you did not supply a -dataex- example (read the FAQ please) so I don't know the names of your variables; also, are you sure you want the coding to be 0, 1, 2, 4??? change below if not

Code:

gen byte abuse=0 replace abuse=1 if sexabuse==1 & phusabuse==0 replace abuse=2 if sexabuse==0 & physabuse==1 replace abuse=4 if sexabuse==1 & physabuse==1

note that the above assumes that, if there are any missing data, you want the new variable to be "0" - this might not be what you want

you probably want labels too; see

Code:

help label
Comment
Tani Akinbode

Join Date: Feb 2019

Posts: 6
#10

15 Feb 2019, 19:20

thanks for your response Rich. I meant 0-no abuse, 1 sex abuse only, 2 physical abuse only and 3 both abuse
Comment
Tani Akinbode

Join Date: Feb 2019

Posts: 6
#11

15 Feb 2019, 19:24

Thank you. I was able to get this sorted.
Comment

Announcement

How to model non-mutually exclusive categorical variables?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment