Multiresponse variable as dependent - question about treatment and analysis

Guest
#1

Multiresponse variable as dependent - question about treatment and analysis

01 Mar 2016, 10:30

Dear members,

I have tried to find my answer, but I could not find enough material to be sure about this issue.

I have a survey dataset in which one of the variables is a multiple response question with no order - the respondent can answer "yes" or "no" to the six options, choosing as many as applied in her case.

I expect this question to become my dependent variable and obviously I cannot apply a multinomial directly to the data as is.

After reading some posts on old and new statalist as well as Cox and Kohler's "Speaking Stata: On structure and shape: the case of multiple responses", I am still puzzled.

First, the variable treatment. I found two solutions that seem quick, feasible, and rather easy to interpret after the regression.

1) concatenate the answers to get responses like "10010".
2) "reshape to long"

The first one will give me, in practice, categories related to all the possible combinations, if I understood well. I am a little concerned about it giving me too many categories.
The second one creates a "false panel", as it was pointed out somewhere. What approach would you take here?

As for the "post variable treatment": can I analyze the data with the regular protocols/models available in Stata? Do I need to pay attention to something else?

I hope I was clear in my questions, but if I was not, I'll be happy to clarify them.

Best

Last edited by sladmin; 28 Jul 2016, 11:23. Reason: anonymize post
Tags: categorical, concatenate, multinomial, multiple response, reshape
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

01 Mar 2016, 11:17

You need to give more information about the question and the six response options. It's not a matter of a statistical trick. It depends on the substance.

You are right to worry about creating a categorical variable representing each combination of responses. That will be 64 categories--which you will not be able to work with in any sensible way.

If the six response options are sufficiently unrelated to each other, then perhaps you have 6 different dichotomous outcomes that need to be analyzed separately.

If the six response options can be conceptualized as different aspects of a latent attribute, then constructing a scale using confirmatory factor analysis might make sense.

There are other possibilities as well.
Comment
Nathan E. Fosse

Join Date: Jul 2014

Posts: 66
#3

01 Mar 2016, 11:19

There is some theory required here: are the dichotomous items used to measure an underlying construct?

(1) Model binary variables using poisson regression: If your binary factor variables measure one, underlying construct, then the most basic approach to sum the number of "yes" responses per participant and model this count using poisson regression. Examples of this instance might be a battery of questions that ask participants if they experienced certain stressors in the past year; then the outcome is a count of the number of stressors. You can build more complex models on the left or the right side of the equation, but I'd start here (search poisson). Morle advanced techniqiues may permit you to model the left-hand side of the equation as latent variable, as one would in latent class models (ssc install gllamm for more info on that).

(2) Run separate logits for each outcome: If your binary factor variables measure different constructs, then why are you interested in modeling them jointly? Because the outcomes are mutually inclusive, you indeed will have too many categories. Multinomial models are difficult to communicate to many social scientists, from my experience. In this case I would return to theory and consider modeling these as separate outcomes in separate models (as a start) or perhaps in a more complex model.

(3) Discrete choice models: A final option is to examine the available discrete choice models Stata offers. The main problem is that these restrict your data so that only one option is selected, but nested logit models relax those assumptions. You may also find that your data structure can be modeled as a discrete choice model. Here you still must reshape the data from wide to long format, so that the rows are the options and the survey participants are repeated iteratively for the number of options available. Discrete choice models are an extension of the multinomial logit model. See asclogit to start, although this won't work, and consider the nlogit model, which has more potential.

In short, you have a lot of options, but you need to make some theoretical decisions first. Fortunately, Stata provides you with considerable possibilities depending on the meaning of the items under inquiry. Good luck. (Also, providing syntax and more detail on the problem may provide more specific answers).

Nathan E. Fosse, PhD
[email protected]
Comment
Guest
#4

01 Mar 2016, 14:35

Thank you very much Clyde Schechter and Nathan E. Fosse for the insightful advice.

You are right, I should have given you more information about my study.

So, I have a number of entrepreneurs and they have some options to finance an independent project that they may sell to an incumbent or sell themselves. My focus is on the financing alternatives they choose to put this project out, being

(a) use their own resources, finance (including doing some tasks by themselves instead hiring someone)
(b) advances in the case of having pre-sold the project to an incumbent
(c) rely on models of financing over the Internet (such as "crowdfunding")
(d) rely on government subsidies
(e) rely on exchanges with other companies / professionals
(f) others

(just for clarification, due to the nature of the industry, we do not include banks or venture capital, for it is not an industry where it is common to go for these sources)

They were answered as part of the same question, and they are organized as six dichotomous variables right now.
Nathan E. Fosse, I thought about running several logit models, one for each option. Particularly because the use of one depends on the access to the others. My issue is that I would have endogeneity arising from the simultaneity between these variables and I am not sure I would be able to treat it for I don't have one instrument for each endogenous variable available on my dataset.

But as I am particularly interested on the third option (Internet funding), it may make sense to create a dichotomous variable = 1 if the entrepreneur uses Internet funding (exclusively or combined with other form) and zero otherwise.

Another option would be to create some categories combining some of the options, and I'll have to give it a little more thought.

If you do have any other thoughts on this given new information provided, I'd be happy to know them. But your inputs were already extremely helpful, thank you.
Comment
Nathan E. Fosse

Join Date: Jul 2014

Posts: 66
#5

02 Mar 2016, 04:02

It sounds like a very interesting project. I would do some exploratory data analysis to make sense of the outcomes, to determine primarily if you can reduce the dummy outcomes into a few categories. Then run using mlogit. There are some good packages available to visualize multinomial logit models, notably Scott Long's mlogplot available on SSC.

Another way to reduce your data dimensionality is to employ latent class analysis on the left-hand side of the equation (I mentioned this, but I'm repeating it, as that's where I'd go with this). See www.gllamm.org.(Or use Stata's cluster command, but I didn't just mention that, did I?).

Cheers,
- Nate

Last edited by sladmin; 28 Jul 2016, 11:23. Reason: anonymize original poster

Nathan E. Fosse, PhD
[email protected]
Comment
Guest
#6

02 Mar 2016, 04:31

Nathan E. Fosse I will give some thought on how to group the outcomes into few options. And I will definitely look into the option of latent class analysis that you mentioned.

Thank you again for your time and attention, it was very very helpful.
Comment

Announcement

Multiresponse variable as dependent - question about treatment and analysis

Comment

Comment

Comment

Comment

Comment