Combine several variables (binary yes/no) to one variable (containing only yes answers)

Stephan Krayter

Join Date: Dec 2018

Posts: 8
#1

Combine several variables (binary yes/no) to one variable (containing only yes answers)

14 Jun 2023, 23:55

Hello,

in a dataset on health issues I have several single variables with a binary yes/no option. It pretty much looks like this.

"Do you have psychosocial issues?" --> Yes 40 No 30
"Do you have kardiological issues?" --> Yes 20 No 50
"Do you have neurological issues?" --> Yes 50 No 20

and so on with 4 more variables with different health issues. All variables have the same number of observations (N=70) with no missings in the dataset.

I want to combine all of these to one variable containing only the "yes" answers. It should look like this

"Which health issues do you have?"
Psychological 40
Cardological 20
Neurological 50
and so on

I know that this will increase the number of observation as this will be a multiple choice variable, where people can have psychological AND cardiological issues, but that is fine for me.

I was able to combine the variables with the following command, but this does not consider all answers given, but only one for each person that has answered the questionaire

gen indikation =.
replace indikation = 1 if indpsycho == 1
replace indikation = 2 if indcardio == 1
replace indikation = 3 if indneuro == 1
and so on

If someone has answered yes (1) in the psychological AND the cardiological question, they will be first considered in the variable "indikation" with option 1 and then replaced with the second command line to option 2. But what I want is that they should appear in both options. So there should be something like "add" instead of "replace".

Is there a way to manage this in stata? Thanks for your answers
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35724
#2

15 Jun 2023, 00:45

You seem to be asking for quite different things.

Assuming that indpsycho indcardio indneuro are 0, 1 and possibly missing then the number of symptoms is

Code:

gen n_symptoms = (indpsycho == 1) + (indcardio == 1) + (indneuro == 1)

or

Code:

egen n_symptoms = rowtotal(indpsycho indcardio indneuro)

(I note that you say there are no missings, but I want the answer to be general enough to work with missings, in case you have them in future or someone does who is interested in the question because of loosely similar data.)

You can have a non-aligned variable like this (extend to 7 variables)

Code:

gen which = word("psycho cardio neuro", _n) in 1/3 gen count_which = . local i = 1 foreach v in psycho cardio neuro { count if ind_`v' == 1 replace count_which = r(N) in `i' local ++i } tabdisp which, c(count_which)

This could also be useful.

Code:

gen combo = "" foreach v in psycho cardio neuro { replace combo = combo + " " + "`v'" if ind`v' == 1 } replace combo = trim(combo) tab combo tab combo, sort foreach

Another possibility is to use a new frame.

I strongly advise against adding new observations. That is a spreadsheet habit that makes sense to people accustomed to a spreadsheet scrapbook with original data here, summary data there, a graph there, and so forth, but the habit just complicates Stata use awkwardly. Ever after adding stuff to your dataset you will need to exclude the extras from other calculations or else risk a mess.
Comment

Announcement

Combine several variables (binary yes/no) to one variable (containing only yes answers)

Comment