Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • legend for side by side boxplot using over() command of one categorical variable?

    statalist_boxplotexample.pdf Hi,

    I was wondering if anyone with more experience with Stata can help me with this request. I am currently using Stata 17. I have tried searching past forums and was not able to find an exact solution.
    I have panel data with 4 variables that represent log transformed cytokine concentrations and one categorical variable of bmi (0=Normal weight, 1=Overweight/Obese) over two visits (visit=2,3). I would like to create 4 graphs of these continuous cytokines graphed as side by side boxplots of the cytokine concentrations by normal weight vs overweight/obese for each visit. My ultimate goal is to use graph combine to create on composite graph of essentially 8 box plots. I was able to generate the composite graph, but the issue is the x axis is very cluttered with the labels of "Normal Weight" and "Overweight/Obese".
    I have two questions
    1) for a over() command with one categorical variable, is it possible to generate a legend
    2) would it be possible to have the legend and no labels on the x axis

    Here is my code:
    Code:
    graph box lne_tnfa_ if visit==2, over(bmi) ///
            ytitle("Cytokine Concentration (pg/mL)")    ///
            ylabel(,nogrid) ///
            title(TNF-a,ring(0) pos(12))    ///
            graphregion(color(white))    ///
            name(g1, replace)
       graph box lne_ifny_ if visit==2, over(bmi) ///
            ytitle("")     ///
            ylabel(,nogrid nolabels notick)    ///
            yscale(lstyle(none))    ///
            title(IFN-y,ring(0) pos(12))    ///
            graphregion(color(white))    ///
            name(g2, replace)
        graph box lne_il6_, over(bmi)    ///
            ytitle("")     ///
            ylabel(,nogrid nolabels notick)    ///
            yscale(lstyle(none))    ///
            title(IL-6,ring(0) pos(12))     ///
            graphregion(color(white))    ///
            name(g3, replace)
        graph box lne_il10_, over(bmi)    ///
            ytitle("")     ///
            ylabel(,nogrid nolabels notick)    ///
            yscale(lstyle(none))    ///
            title(IL-10,ring(0) pos(12))     ///
            graphregion(color(white))    ///
            name(g4, replace)
        graph combine g1 g2 g3 g4, title(Cytokines by BMI) ///
        ycommon row(1) graphregion(color(white)) ///
        graph save cytokines_bmi, replace
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id double visit byte bmi float(lne_tnfa_ lne_ifny_ lne_il6_ lne_il10_)
      11 2 0   .03246719   .3213586   1.132047  -1.439695
      11 3 0  .034401428   2.057707  1.5924952  -2.063568
      22 2 1   2.2824845  2.1597536   2.802451  1.3080626
      25 2 1    .7207624    2.11758  2.1355855 -1.5005835
      25 3 1   1.2116433   .9384435   1.437225   -.449417
      65 2 0   1.4362743  2.0146363  1.7660997   .3371863
      65 3 0    .6418539 -.05551271   .4101209  -.7700282
     170 2 1    .6418539   .9965796  1.0246068   -.761426
     170 3 1    2.579459   6.603427   5.740973   1.784399
     308 2 1    3.377451   4.025423   3.326725    1.79409
     308 3 1    3.451542   3.681729  3.1554246  2.2916253
     503 2 1    -.443167 -1.4188175  1.0770481  -2.937463
     503 3 1   1.1216775   .1781462   2.886364  -1.324259
    end
    label values bmi bmi
    label def bmi 0 "Normal Weight", modify
    label def bmi 1 "Overweight/Obese", modify
    This is the graph of the boxplot I was able to generate, but as you can see the X axis is very cluttered. It would be great if I can get rid of the text and just have a legend in the final combined graph.


    Thank you for any advice. I am still new to posting, so apologies in advance if this was not clear.
    Attached Files

  • #2
    Thanks for the data example. The following uses grc1leg2 from http://digital.cgdev.org/doc/stata/MO/Misc by Mead Over. For your future posts, note the recommendation to upload pictures in .PNG format as I do below (refer to FAQ Advice #12 for details).

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int id double visit byte bmi float(lne_tnfa_ lne_ifny_ lne_il6_ lne_il10_)
      11 2 0   .03246719   .3213586   1.132047  -1.439695
      11 3 0  .034401428   2.057707  1.5924952  -2.063568
      22 2 1   2.2824845  2.1597536   2.802451  1.3080626
      25 2 1    .7207624    2.11758  2.1355855 -1.5005835
      25 3 1   1.2116433   .9384435   1.437225   -.449417
      65 2 0   1.4362743  2.0146363  1.7660997   .3371863
      65 3 0    .6418539 -.05551271   .4101209  -.7700282
     170 2 1    .6418539   .9965796  1.0246068   -.761426
     170 3 1    2.579459   6.603427   5.740973   1.784399
     308 2 1    3.377451   4.025423   3.326725    1.79409
     308 3 1    3.451542   3.681729  3.1554246  2.2916253
     503 2 1    -.443167 -1.4188175  1.0770481  -2.937463
     503 3 1   1.1216775   .1781462   2.886364  -1.324259
    end
    label values bmi bmi
    label def bmi 0 "Normal Weight", modify
    label def bmi 1 "Overweight/Obese", modify
    
    foreach var of varlist lne*{
        separate `var', by(bmi) veryshortlabel
    }
    
    net install grc1leg2, from("http://digital.cgdev.org/doc/stata/MO/Misc")
    
    graph box lne_tnfa_? if visit==2, over(bmi, label(nolab)) ///
            ytitle("Cytokine Concentration (pg/mL)")    ///
            ylabel(,nogrid) ///
            title(TNF-a,ring(0) pos(12))    ///
            graphregion(color(white))    ///
            name(g1, replace) 
       graph box lne_ifny_? if visit==2, over(bmi, label(nolab)) ///
            ytitle("")     ///
            ylabel(,nogrid nolabels notick)    ///
            yscale(lstyle(none))    ///
            title(IFN-y,ring(0) pos(12))    ///
            graphregion(color(white))    ///
            name(g2, replace)
        graph box lne_il6_?, over(bmi, label(nolab))    ///
            ytitle("")     ///
            ylabel(,nogrid nolabels notick)    ///
            yscale(lstyle(none))    ///
            title(IL-6,ring(0) pos(12))     ///
            graphregion(color(white))    ///
            name(g3, replace)
        graph box lne_il10_?, over(bmi, label(nolab))    ///
            ytitle("")     ///
            ylabel(,nogrid nolabels notick)    ///
            yscale(lstyle(none))    ///
            title(IL-10,ring(0) pos(12))     ///
            graphregion(color(white))    ///
            name(g4, replace)
        grc1leg2 g1 g2 g3 g4, title(Cytokines by BMI) ///
        ycommon row(1) graphregion(color(white))

    Click image for larger version

Name:	Graph.png
Views:	1
Size:	42.8 KB
ID:	1730394

    Comment


    • #3
      Here some different ideas. I used stripplot from SSC in Stata 18. What base of logarithms did you use? You would get the best of both worlds with power notation.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int id double visit byte bmi float(lne_tnfa_ lne_ifny_ lne_il6_ lne_il10_)
        11 2 0   .03246719   .3213586   1.132047  -1.439695
        11 3 0  .034401428   2.057707  1.5924952  -2.063568
        22 2 1   2.2824845  2.1597536   2.802451  1.3080626
        25 2 1    .7207624    2.11758  2.1355855 -1.5005835
        25 3 1   1.2116433   .9384435   1.437225   -.449417
        65 2 0   1.4362743  2.0146363  1.7660997   .3371863
        65 3 0    .6418539 -.05551271   .4101209  -.7700282
       170 2 1    .6418539   .9965796  1.0246068   -.761426
       170 3 1    2.579459   6.603427   5.740973   1.784399
       308 2 1    3.377451   4.025423   3.326725    1.79409
       308 3 1    3.451542   3.681729  3.1554246  2.2916253
       503 2 1    -.443167 -1.4188175  1.0770481  -2.937463
       503 3 1   1.1216775   .1781462   2.886364  -1.324259
      end
      label values bmi bmi
      label def bmi 0 "Normal Weight", modify
      label def bmi 1 "Overweight/Obese", modify
      
      rename (lne_tnfa lne_ifny lne_il6 lne_il10) (lne_1 lne_2 lne_3 lne_4) 
      reshape long lne_ , i(id visit) j(which) 
      label def which 1 "TNF-a" 2 "IFN-y" 3 "IL-6" 4 "IL-10" 
      label val which which 
      
      stripplot lne_, stack over(bmi) by(which, note("") col(1)) subtitle(, pos(9) fcolor(none) nobexpand) /// 
      box(barw(0.2)) boffset(-0.3) pctile(0) ytitle("") separate(visit) ///
      xtitle(log Cytokine Concentration (pg/mL))
      Click image for larger version

Name:	cytokine.png
Views:	1
Size:	38.0 KB
ID:	1730449

      Comment


      • #4
        A crucial point here is that with very small samples the box plots can seem enigmatic. For example, if a subsample is size up to 3, then whiskers not appear, as estimated quartiles and extremes coincide. That's not wrong but it is often puzzling to readers.

        This isn't evident in the original graph -- pdf attachment to #1, although .png would have been more immediate -- which is presumably based on a larger, full dataset. Even so, my sense here is that this is a small enough dataset that a fuller dot or strip plot display would be more informative.

        Comment

        Working...
        X