Create a new variable from existing variables where the categories are not mutually exclusive

Jane Buerckner

Join Date: Jun 2021
Posts: 9

Create a new variable from existing variables where the categories are not mutually exclusive

05 Jun 2021, 01:00

Hi,

I am trying to create a new variable adv_outcome. I have 5 pre-existing variables I am interested in sad, angry, grief, disappoint, hope. These categories are not mutually exclusive (i.e. some patients will be sad AND angry). These categories are all binary 0=no, 1=yes.

I tried using
gen adv_outcome= 0 if sad==0 & angry==0 & grief==0 & disappoint==0 & hope==0
replace adv_outcome=1 if sad==1
replace adv_outcome=2 if angry==1
replace adv_outcome=3 if grief==1
replace adv_outcome=4 if disappoint==1
replace adv_outcome=5 if hope==1

The issue when I do this is that as it creates each new category it "takes" the observations from the previous category instead of adding them to the tally for both. The reason I want this adv_outcome variable is so that I can compare it to various other factors such as age, gender, rurality and have it show me what % of men end up experiencing each adverse outcome rather than creating 6 separate tables for each patient-factor variable. I do have an any_outcome variable that was created if any of the 5 (sad, angry, grief, disappoint, hope.) occurred which seems to be working fine, but I'd like to see which outcomes were most common.

This is what I would like the crosstabs to look like: (%'s made up for example purpose, obviously just want the numbers not the words but wanted to convey what I want as clearly as possible so put the words in for this)

	Metro	Regional	Rural
none	12% experienced none	24% experienced none	5% experienced none
sad	36% experienced sad	47% experienced sad	60% experienced sad
angry	40% experienced angry	10% experienced angry	40% experienced angry
grief	65% experienced grief	60% experienced grief	30% experienced grief
dissapoint	35% experienced disappointment	5% experienced disappointment	90% experienced disappointment
hope	80% experienced hope	70% experienced hope	67% experienced hope

This then brings me to my next issue, the dataset is 2,000 people (1,000 men, 1,000 not-men). If the patients fit in with multiple groups, there could end up being a total of 5,000 in each column assuming that all 1,000 men felt all 5 emotions, so would percentages then be calculated /5,000 instead of /2,000?

Is it easier to just run each table separately and manually type the numbers into a normal table? Am I missing something super obvious? I feel like I've been staring at it all day making zilch progress.

Thanks, J

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35639
#2

05 Jun 2021, 03:56

I am trying to create a new variable adv_outcome. I have 5 pre-existing variables I am interested in sad, angry, grief, disappoint, hope. These categories are not mutually exclusive (i.e. some patients will be sad AND angry). These categories are all binary 0=no, 1=yes.

I tried using
gen adv_outcome= 0 if sad==0 & angry==0 & grief==0 & disappoint==0 & hope==0
replace adv_outcome=1 if sad==1
replace adv_outcome=2 if angry==1
replace adv_outcome=3 if grief==1
replace adv_outcome=4 if disappoint==1
replace adv_outcome=5 if hope==1

The issue when I do this is that as it creates each new category it "takes" the observations from the previous category instead of adding them to the tally for both.

Quite. This is a problem.

The reason I want this adv_outcome variable is so that I can compare it to various other factors such as age, gender, rurality and have it show me what % of men end up experiencing each adverse outcome rather than creating 6 separate tables for each patient-factor variable. I do have an any_outcome variable that was created if any of the 5 (sad, angry, grief, disappoint, hope.) occurred which seems to be working fine, but I'd like to see which outcomes were most common.

This is what I would like the crosstabs to look like: (%'s made up for example purpose, obviously just want the numbers not the words but wanted to convey what I want as clearly as possible so put the words in for this)

For the table you give, the only new variable you need is none.

This then brings me to my next issue, the dataset is 2,000 people (1,000 men, 1,000 not-men). If the patients fit in with multiple groups, there could end up being a total of 5,000 in each column assuming that all 1,000 men felt all 5 emotions, so would percentages then be calculated /5,000 instead of /2,000?

I don't understand. The percents you need are percents of the group in question. The denominator is what it is. If beyond the table you give, you want a breakdown by men and women, you just specify separate tables.

Here is some technique. These are just invented data. The mean of a (0, 1) indicator is just the proportion of 1s (in your problem, those saying Yes). To see a percent, multiply by 100 either before you take the mean or afterwards.

Code:

clear set obs 900 foreach v in sad angry grief disappoint hope { gen `v' = runiformint(0, 1) } gen where = ceil(_n/300) label def where 1 Metro 2 Regional 3 Rural label val where where * none is true 1 when all the others are 0 and false 0 otherwise gen none = max(sad, angry, grief, disappoint, hope) == 0 * to see percents not proportions, multiply by 100 foreach v of var none sad-hope { local V = proper("`v'") gen `V' = 100 * `v' } tabstat None Sad-Hope, by(where) format(%2.0f) Summary statistics: Mean Group variable: where where | None Sad Angry Grief Disapp~t Hope ---------+------------------------------------------------------------ Metro | 5 48 51 45 51 48 Regional | 3 50 51 46 48 49 Rural | 2 55 53 51 48 49 ---------+------------------------------------------------------------ Total | 3 51 52 47 49 49 ----------------------------------------------------------------------

For men and women separately, you may find it easier just to run the command separately.

Whether you mess around in Stata to get desired output (MS Word, MS Excel, LaTeX, HTML, XML) or mess around in some other program to get Stata output formatted congenially is a matter of personal taste and where your skills lie.
1 like
Comment
Jane Buerckner

Join Date: Jun 2021

Posts: 9
#3

06 Jun 2021, 17:34

Thank you for your reply Nick!
Comment

Announcement

Create a new variable from existing variables where the categories are not mutually exclusive

Comment

Comment