multivariate analysis on a multiple selection survey question

Christina Leon

Join Date: May 2017

Posts: 3
#1

multivariate analysis on a multiple selection survey question

25 May 2017, 11:43

I am trying to run a multivariate regression to see if certain traits are correlated with certain responses relating to housing preferences and barriers. Stata automatically creates separate variables for "check all that apply"/multiple selection survey questions. Is there a way to create a variable that reflects these responses or another way to do a multivariate analysis aside from mlogit?

I know the following is incorrect because it falls to take into account if an observations has "yes" more than once for LS_*:
gen current_ls="" replace current_ls="near family and friends" if LS_Fam_Friends=="Yes" replace current_ls="close to current job or job opportunities" if LS_Job=="Yes" replace current_ls="within walking distance of necessary stores" if LS_stores=="Yes" replace current_ls="close to health services" if LS_health_services=="Yes" replace current_ls="Safety of the neighborhood" if LS_safe_neighbor=="Yes" replace current_ls="Security in your building" if LS_secure_building=="Yes"
encode current_ls, gen(current_ls2) order current_ls2, after(current_ls) label var current_ls2 "current living situation characteristics destringed" mlogit js_in_past age_bucket physical_dis_ js_ race_cat education_
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

25 May 2017, 12:47

Christina, you are more likely to get a helpful and timely response if you post the code in a readable format. I doubt anyone is going to struggle with reading the jumble in #1. It practically makes one dizzy just to glance at it. Please read FAQ#12 for excellent advice on the best ways to post data examples, code, and Stata output on this Forum, and follow it.

I think it would probably also help if you posed a question that is a bit more specific. What exactly do the data look like in their current form. (Show an example with -dataex-.) What about that is unsatisfactory or problematic, and in what way would you like to have an improved version?
Comment
Christina Leon

Join Date: May 2017

Posts: 3
#3

25 May 2017, 13:29

Hi Clyde,

I apologize--this is my first time on the forum and when I posted earlier it looked like the code was in a readable format. But, to provide more insight, the responses for a multiple selection survey question were generated as separate variables. So for the question "Does your current living situation your needs for the following characteristics?" which has statements like "near family and friends", "close to current job", etc. stata generated LS_Fam_Friends, LS_Job, etc. Participants had the option of saying "Yes", "No", N/A or "refused".

I am trying to do a multivariate analysis to see how race, education, age, etc. are related to the outcomes indicated and to each other. The issue I am having with this code is that when I try to generate a composite variable of sorts, an observation that has more than one "yes" for living situation characteristics is not represented. Please let me know if you need any additional info/clarification.

Code:

gen current_ls="" replace current_ls="near family and friends" if LS_Fam_Friends=="Yes" replace current_ls="close to current job or job opportunities" if LS_Job=="Yes" replace current_ls="within walking distance of necessary stores" if LS_stores=="Yes" replace current_ls="close to health services" if LS_health_services=="Yes" replace current_ls="Safety of the neighborhood" if LS_safe_neighbor=="Yes" replace current_ls="Security in your building" if LS_secure_building=="Yes" encode current_ls, gen(current_ls2) order current_ls2, after(current_ls) label var current_ls2 "current living situation characteristics destringed" mlogit current_ls2 age_bucket physical_dis_ js_ race_cat education_

Last edited by Christina Leon; 25 May 2017, 13:33.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

25 May 2017, 14:41

Well, this is a fairly common problem with multiple selection questions. You have six different response options. If you want to consider each combination as a separate outcome, you are dealing with 2⁶ = 64 outcome levels. You could, in theory, create a 64-level variable to cover all of these combinations, and use it as your dependent variable in mlogit. But you would need an enormous data set to get any meaningful results. And even assuming you did that, you probably would tear your hair out trying to understand and interpret the results and explain them to others.

In my experience, the usual way of handling multiple selection data is to use each response option as a separate outcome variable in a series of -logit- models:

Code:

foreach v of varlist LS_Fam_Friends LS_Job LS_stores LS_health_services LS_safe_neighbor LS_secure_building { logit `v' age_bucket physical_dis_ js_ race_cat education_ }
2 likes
Comment
Christina Leon

Join Date: May 2017

Posts: 3
#5

26 May 2017, 09:06

Oh okay, I understand. Thanks for your feedback/suggestions!
Comment

Announcement

multivariate analysis on a multiple selection survey question

Comment

Comment

Comment

Comment