Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to color and label individual points in stripplot by category

    I'm trying to plot diagnostic data from a clinical trail and color code them according to the number of standard deviations from the mean as well as label those values >1.5*IQR. This is to compare the two treatment groups and visualise potential outliers.

    I've discovered the very useful stripplot and managed to add value labels using a variable 'mylabel' (all observations <1.5*IQR are defined as missing). To color the individual points by category, I created a variable cat_SD sorting the observations according to the number of standard deviations from the mean cat_SD = {1,2,3,4,5}. This was done by treatment group. I used separate() option to color individual points by their respective category.

    The issue I have is that each on their own (value or color) works very well, but if I add them both together, the value labels of the SD categories >1 disappear. (The code I used is below.)

    Would anyone have a suggestion on how I can add both value label and category into the plot? Also if someone knows a different way to label the values so that 'unlabeled' values are not labeled with a "." that would also be much appreciated.
    If it is of importance: I'm using Stata13 for Windows and am very new to Stata.

    Thanks for your help!

    Code:
    *Figure1: Add value labels for values > IQR*1.5
    stripplot d_HA , over(treatment) vertical cumul centre box iqr refline ytitle("concentration (nM)") mlabel(mylabel)
    
    *Figure2: Color markers according to SD category 1-5
    stripplot d_HA , over(treatment) vertical cumul separate(cat_SD) centre box iqr refline ytitle("concentration (nM)")
    
    *Figure3: Add value labels and color markers --> removes labels from all categories >1
    stripplot d_HA , over(treatment) vertical cumul separate(cat_SD) centre box iqr refline ytitle("concentration (nM)") mlabel(mylabel)
    Click image for larger version

Name:	label_values.png
Views:	1
Size:	20.4 KB
ID:	1418240

    Click image for larger version

Name:	Color_markers.png
Views:	1
Size:	18.1 KB
ID:	1418241

    Click image for larger version

Name:	color_label.png
Views:	1
Size:	20.0 KB
ID:	1418242

    Last edited by Laura Haag; 14 Nov 2017, 11:32.

  • #2
    stripplot is from SSC.

    The lack of a data example inhibits experiment but I think you need to repeat the mlabel() argument three times. There are now three variables being plotted and your mlabel() only qualifies the first of those. This is a standard graphics issue and not intrinsic to stripplot.

    Code:
    stripplot d_HA , over(treatment) vertical cumul separate(cat_SD) centre ///
    box iqr refline ytitle("concentration (nM)") mlabel(mylabel mylabel mylabel)

    Comment


    • #3
      Thanks very much! That worked wonderfully, I didn't realize that I was plotting multiple variables and spent a lot of time trying to plot the different categories. Apologies for forgetting to post a data example, will keep it in mind for next time.Thanks again.

      Comment


      • #4
        That's fine.

        I will talk to the author of stripplot about making the point about separate() more explicit in the help. But the fact that the graph is showing several variables with several markers follows from the legend: you couldn't have separate marker symbols shown there otherwise.

        Note that it's evident from your graph that cat_SD is numeric, as there are lots of little dots (periods, stops) for observations you don't want to flag. Using a string variable would be better, as then missings are shown by empty strings and you won't see anything.

        Code:
        gen s_catSD = string(cat_SD)
        Also, your legend would look better as one row and/or inserted as one column inside the data region. That's partly a matter of taste.

        I tend to suppress ticks on the x axis when the variable is categorical.

        Comment


        • #5
          Hello Nick, thanks for your suggestions regarding the plot, especially the tip on using a string variable to label the observations! My graph looks much nicer now.

          I have come across a different issue for which I was wondering if you have some advice as well:
          Is there any way I can define a specific color to each SD category? E.g.
          Code:
          mcolor(black) if s_catSD ==1
          or so?

          To elaborate: The problem I have is that the marker color of a specific category changes depending on how many categories there are in total for different variables. For example say I have two variables X and Y which I want to plot in the same way, but in two different graphs. Because the variation within the two variables may be very different, I end up with a different number of categories of # of SD from the mean.
          My five SD categories are
          (1) < 2 SD from the mean,
          (2) 2-3 SD from the mean,
          (3) 3-4 SD from the mean,
          (4) 4-5 SD from the mean, and
          (5) >5 SD from the mean.
          For example variable X may have observations with SD categories 1, 2, and 5 (top figure), but variable Y may have observations across all five categories (bottom figure).
          I used the following code to generate the attached graphs. To create the second graph I generated var 's_catSD2' to which I merely added in the two remaining SD categories for two observations (rows 1 and 3) and then used 's_catSD2' to separate my groups.
          Code:
          stripplot d_HA , over(treatment) vertical cumul separate(s_catSD) centre box iqr ///
          refline ytitle("concentration (nM)") mlabel(s_mylabel s_mylabel s_mylabel s_mylabel s_mylabel) ///
          xlabel(, notick) msymbol(Oh Oh Oh Oh Oh) mcolor(blue green orange black red) ///
          mlabcolor(blue green orange black red) legend(cols(1) ring(0) bplacement(nwest) order(9 6 7 8 10))
          Ispecified the order of the legend as well as the order of the color in such a way that I get a logical order of SD categories 1-5 in the colors from black to red. However if I have less than 5 categories, the colors are not maintained for the same categories (which does not really come as a surprise from the code I put in).

          The data for the graphs is attached at bottom (note: the numeric cat_SD is purely to make reading of the SD categories easier).

          Thanks!

          Click image for larger version

Name:	colortestorder.png
Views:	3
Size:	25.6 KB
ID:	1418662
          Click image for larger version

Name:	colortestorder2.png
Views:	1
Size:	28.2 KB
ID:	1418663


          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(d_HA cat_SD) str20 s_catSD float cat_SD2 str16 s_catSD2 str3 s_mylabel
            -862.5 1 "<2 SD from mean"  3 "3-4 SD from mean" "1"  
                 . 5 ">5 SD from mean"  5 ">5 SD from mean"  "4"  
           -656.77 1 "<2 SD from mean"  4 "4-5 SD from mean" "6"  
            1427.9 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
             393.4 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
              -551 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
          -2929.73 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
              66.9 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           1959.56 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
             -5216 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            3752.1 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -586.91 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -1052.6 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            5034.3 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            -130.8 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            359.06 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -448.86 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
          -1327.51 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
                 . 5 ">5 SD from mean"  5 ">5 SD from mean"  "39" 
             13352 1 "<2 SD from mean"  1 "<2 SD from mean"  "42" 
            -35.85 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
          -1109.46 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
              9428 1 "<2 SD from mean"  1 "<2 SD from mean"  "49" 
               780 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            -829.6 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
             -11.3 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
                 . 5 ">5 SD from mean"  5 ">5 SD from mean"  "55" 
            2321.8 1 "<2 SD from mean"  1 "<2 SD from mean"  "56" 
            1227.3 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
               6.7 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            -678.7 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
             508.3 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
                 . 5 ">5 SD from mean"  5 ">5 SD from mean"  "67" 
             194.4 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -4742.7 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -1581.9 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
                 . 5 ">5 SD from mean"  5 ">5 SD from mean"  "72" 
            -473.4 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            3476.1 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
                 . 5 ">5 SD from mean"  5 ">5 SD from mean"  "75" 
              -537 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -269.12 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            2318.1 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            -427.9 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            3060.9 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -292.61 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           4709.24 1 "<2 SD from mean"  1 "<2 SD from mean"  "87" 
             718.4 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
               346 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            1444.6 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            1348.3 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -505.58 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            2351.7 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            -546.3 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
             348.9 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           2290.04 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
             567.8 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            522.75 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -2674.5 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
             966.2 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
          -1367.38 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            1111.6 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            7952.6 1 "<2 SD from mean"  1 "<2 SD from mean"  "119"
            -814.1 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           19817.5 2 "2-3 SD from mean" 2 "2-3 SD from mean" "125"
           -283.56 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
                 . 5 ">5 SD from mean"  5 ">5 SD from mean"  "127"
           -503.16 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           2101.83 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
               379 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           12624.7 1 "<2 SD from mean"  1 "<2 SD from mean"  "134"
           16827.5 2 "2-3 SD from mean" 2 "2-3 SD from mean" "136"
             585.4 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            989.83 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
           -165.64 1 "<2 SD from mean"  1 "<2 SD from mean"  ""   
            8662.8 1 "<2 SD from mean"  1 "<2 SD from mean"  "148"
           15000.2 5 ">5 SD from mean"  5 ">5 SD from mean"  "154"
          end
          Attached Files

          Comment

          Working...
          X