Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Altering variable labels in boxplot for multiple non-exclusive variables

    Hi Statalist, I'm writing with what seems like an embarrassingly simple question which for some reason I haven't been able to sort out. I am using graph box to graph the duration of several different types of symptoms, which are not mutually exclusive:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(hr_duration eye_duration lung_duration heart_duration gi_duration neuro_duration msk_duration mh_duration skin_duration blad_duration mens_duration misc_duration)
    2 .  . . . . . . . . . .
    . .  . . . . . . . . . .
    . .  . . . . . . . . . .
    . .  5 . . . . . . . . .
    . .  . . . . . . . . . .
    1 .  . . . . . . . . . .
    . .  . . . . . . . . . .
    . .  . . . . . . . . . .
    . .  . . . . . . 1 . . .
    . .  . . . . . . . . . .
    . .  . . . . . 6 . . . .
    . .  . . . . . . . . . .
    . .  . . . . . . . . . 5
    . .  . . . . . . . . . .
    . .  . . . . . . . . . .
    . .  . . . . . . . 0 . .
    . .  . . . . . . 1 . . .
    . .  . . . . . . . . . .
    . .  . . . 1 . . . . . .
    . .  . . 0 . . . . . . .
    . .  8 . . . . . . . . .
    . . 10 . . . . . . 6 . .
    . .  . . . . . . . . . .
    . .  . . . . . . . . . .
    . .  . . 2 . . . . . . .
    . .  . . . . . 2 . . . .
    . .  . . . . . . . . . .
    . .  2 . . 4 . . . . . .
    end
    label var hr_duration "Auditory" 
    label var eye_duration "Ophthalmologic" 
    label var lung_duration "Pulmonary" 
    label var heart_duration "Cardiac" 
    label var gi_duration "Gastrointestinal" 
    label var neuro_duration "Neurological" 
    label var msk_duration "Musculoskeletal" 
    label var mh_duration "Psychiatric" 
    label var skin_duration "Dermatologic" 
    label var blad_duration "Genitourinary" 
    label var mens_duration "Menstrual" 
    label var misc_duration "General symptoms"


    (I know there's a lot of missing values, but that's expected and not my main concern at the moment--just want to illustrate for now the general structure of the data.)
    Graphing these symptoms in a straightforward -graph box- statement yields the following:

    Code:
       graph box hr_duration eye_duration lung_duration heart_duration gi_duration neuro_duration msk_duration mh_duration skin_duration blad_duration mens_duration misc_duration, ytitle("Length of symptom duration in months")
    Click image for larger version

Name:	Graph.jpg
Views:	1
Size:	45.9 KB
ID:	1736278

    The scheme is attempting to deal with the fact that I have a lot of variables and as a result is now looking deliciously like a funfetti cake. However, I have several colleagues who are colorblind, for whom this figure won't work, and I also don't have the kind of publication budget I'd need for a color figure like this. Rather than identify these boxes by color via a legend, I'd like to write the name of each symptom underneath each box, at a 45-degree angle, akin to this:

    Click image for larger version

Name:	bw_symptom_duration.gif
Views:	1
Size:	20.3 KB
ID:	1736279


    Since the symptoms are not mutually exclusive, I'm having challenges figuring out how best to implement these labels on the axis. There is no 'over' statement because variables are not mutually exclusive. I know this must be a very simple thing to do and if someone could take mercy on me and show me I'd be super grateful!


  • #2
    This may help, avoiding giraffe graphics (https://journals.sagepub.com/doi/pdf...867X0400400209). I just go for a different data structure and map variable labels to value labels. If you put the commands in a do-file, restore is automatic. Alternatively, use frames.

    See also https://journals.sagepub.com/doi/pdf...6867X211045582 on how to re-order categories.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    set obs 100
    set seed 2803
    local m = 0.5
    foreach v in hr eye lung heart gi neuro msk mh skin blad mens misc {
      gen `v'_duration = rpoisson(`m')
      local m = `m' + 0.5
    }
    label var hr_duration "Auditory"
    label var eye_duration "Ophthalmologic"
    label var lung_duration "Pulmonary"
    label var heart_duration "Cardiac"
    label var gi_duration "Gastrointestinal"
    label var neuro_duration "Neurological"
    label var msk_duration "Musculoskeletal"
    label var mh_duration "Psychiatric"
    label var skin_duration "Dermatologic"
    label var blad_duration "Genitourinary"
    label var mens_duration "Menstrual"
    label var misc_duration "General symptoms"
    
    local j = 1
    foreach v of var *duration {
      local label`j' "`: var label `v''"
      local ++j
    }
      
    
    preserve
    stack *_duration, into(duration) clear
    
    label var duration "Duration of symptoms (months)"  
    
    forval j = 1/12 {
        label def _stack `j' "`label`j''", add
    }
    
    label val _stack _stack
    
    tab _stack
    
    graph hbox duration, over(_stack) ysc(alt)
    Click image for larger version

Name:	symptoms.png
Views:	1
Size:	36.1 KB
ID:	1736286

    Comment


    • #3
      A set of histograms is a serious challenger. Many of those minimums could be popular! The minimum and the lower quartile coincide in 4/12 of the plots in #1 showing that lowest value applies to at least 25% of the patients and possibly almost half. Box plots aren't ideal for counted variables.

      Code:
      egen mean = mean(duration), by(_stack)
      
      gen where = -5
      
      twoway histogram duration, by(_stack, col(2) note("")) freq discrete ///
      lcolor(black) fcolor(none) subtitle(, pos(9) fcolor(none) nobox nobexpand) ///
      || scatter where mean, ms(T) legend(order(2 "mean"))
      Click image for larger version

Name:	symptoms2.png
Views:	1
Size:	65.8 KB
ID:	1736297

      Comment


      • #4
        Hello,

        I wanted to thank you for these two really wonderful options. I tried them both and they both work well (and I realized from your first reply that I should also present the symptoms according to mean duration, if I want the graph to look less chaotic).

        I definitely agree about the challenges in utility of boxplots for this group of people, and will take your advice to show histograms instead.

        Thank you as always for your help! I deeply, deeply appreciate it!

        Maria

        Comment

        Working...
        X