Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a new variable from existing variables where the categories are not mutually exclusive

    Hi,

    I am trying to create a new variable adv_outcome. I have 5 pre-existing variables I am interested in sad, angry, grief, disappoint, hope. These categories are not mutually exclusive (i.e. some patients will be sad AND angry). These categories are all binary 0=no, 1=yes.

    I tried using
    gen adv_outcome= 0 if sad==0 & angry==0 & grief==0 & disappoint==0 & hope==0
    replace adv_outcome=1 if sad==1
    replace adv_outcome=2 if angry==1
    replace adv_outcome=3 if grief==1
    replace adv_outcome=4 if disappoint==1
    replace adv_outcome=5 if hope==1

    The issue when I do this is that as it creates each new category it "takes" the observations from the previous category instead of adding them to the tally for both. The reason I want this adv_outcome variable is so that I can compare it to various other factors such as age, gender, rurality and have it show me what % of men end up experiencing each adverse outcome rather than creating 6 separate tables for each patient-factor variable. I do have an any_outcome variable that was created if any of the 5 (sad, angry, grief, disappoint, hope.) occurred which seems to be working fine, but I'd like to see which outcomes were most common.

    This is what I would like the crosstabs to look like: (%'s made up for example purpose, obviously just want the numbers not the words but wanted to convey what I want as clearly as possible so put the words in for this)
    Metro Regional Rural
    none 12% experienced none 24% experienced none 5% experienced none
    sad 36% experienced sad 47% experienced sad 60% experienced sad
    angry 40% experienced angry 10% experienced angry 40% experienced angry
    grief 65% experienced grief 60% experienced grief 30% experienced grief
    dissapoint 35% experienced disappointment 5% experienced disappointment 90% experienced disappointment
    hope 80% experienced hope 70% experienced hope 67% experienced hope
    This then brings me to my next issue, the dataset is 2,000 people (1,000 men, 1,000 not-men). If the patients fit in with multiple groups, there could end up being a total of 5,000 in each column assuming that all 1,000 men felt all 5 emotions, so would percentages then be calculated /5,000 instead of /2,000?

    Is it easier to just run each table separately and manually type the numbers into a normal table? Am I missing something super obvious? I feel like I've been staring at it all day making zilch progress.

    Thanks, J

  • #2
    I am trying to create a new variable adv_outcome. I have 5 pre-existing variables I am interested in sad, angry, grief, disappoint, hope. These categories are not mutually exclusive (i.e. some patients will be sad AND angry). These categories are all binary 0=no, 1=yes.

    I tried using
    gen adv_outcome= 0 if sad==0 & angry==0 & grief==0 & disappoint==0 & hope==0
    replace adv_outcome=1 if sad==1
    replace adv_outcome=2 if angry==1
    replace adv_outcome=3 if grief==1
    replace adv_outcome=4 if disappoint==1
    replace adv_outcome=5 if hope==1

    The issue when I do this is that as it creates each new category it "takes" the observations from the previous category instead of adding them to the tally for both.
    Quite. This is a problem.

    The reason I want this adv_outcome variable is so that I can compare it to various other factors such as age, gender, rurality and have it show me what % of men end up experiencing each adverse outcome rather than creating 6 separate tables for each patient-factor variable. I do have an any_outcome variable that was created if any of the 5 (sad, angry, grief, disappoint, hope.) occurred which seems to be working fine, but I'd like to see which outcomes were most common.

    This is what I would like the crosstabs to look like: (%'s made up for example purpose, obviously just want the numbers not the words but wanted to convey what I want as clearly as possible so put the words in for this)
    For the table you give, the only new variable you need is none.


    This then brings me to my next issue, the dataset is 2,000 people (1,000 men, 1,000 not-men). If the patients fit in with multiple groups, there could end up being a total of 5,000 in each column assuming that all 1,000 men felt all 5 emotions, so would percentages then be calculated /5,000 instead of /2,000?
    I don't understand. The percents you need are percents of the group in question. The denominator is what it is. If beyond the table you give, you want a breakdown by men and women, you just specify separate tables.


    Here is some technique. These are just invented data. The mean of a (0, 1) indicator is just the proportion of 1s (in your problem, those saying Yes). To see a percent, multiply by 100 either before you take the mean or afterwards.



    Code:
    clear
    
    set obs 900
    
    foreach v in sad angry grief disappoint hope {
        gen `v' = runiformint(0, 1)
    }
    
    gen where = ceil(_n/300)
    
    label def where 1 Metro 2 Regional 3 Rural
    label val where where
    
    * none is true 1 when all the others are 0 and false 0 otherwise
    gen none = max(sad, angry, grief, disappoint, hope) == 0
    
    * to see percents not proportions, multiply by 100
    
    foreach v of var none sad-hope {
        local V = proper("`v'")
        gen `V' = 100 * `v'
    }
    
    tabstat None Sad-Hope, by(where)  format(%2.0f)
    
    Summary statistics: Mean
    Group variable: where
    
       where |      None       Sad     Angry     Grief  Disapp~t      Hope
    ---------+------------------------------------------------------------
       Metro |         5        48        51        45        51        48
    Regional |         3        50        51        46        48        49
       Rural |         2        55        53        51        48        49
    ---------+------------------------------------------------------------
       Total |         3        51        52        47        49        49
    ----------------------------------------------------------------------
    For men and women separately, you may find it easier just to run the command separately.

    Whether you mess around in Stata to get desired output (MS Word, MS Excel, LaTeX, HTML, XML) or mess around in some other program to get Stata output formatted congenially is a matter of personal taste and where your skills lie.

    Comment


    • #3
      Thank you for your reply Nick!

      Comment

      Working...
      X