Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding points and labels for category to box plot

    I am new to using graphics in STATA and had a question that might be obvious to many who have worked with this in the past. I am trying to create a box plot over time (date) for each states value (mean) by category.

    So far I can easily create the box plot like so:
    Code:
     graph box mean if category=="Ban", over(date, sort(seq))
    Though, I am trying to add some additional elements to the box plot:
    1. Adding the individual points for each state that fall within that category
    2. Adding state labels (legend) with different colors to represent each state as opposed to just the outliers
    I believe stripplot might accomplish both these elements I am trying to add to the original box plot? Any guidance would be appreciated.

    Example data can be found here:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str8 state str20 statename str6 date byte seq float mean str23 category
    "AK" "Alaska"     "Jan-21"  1   .4874966 "Legal"      
    "AK" "Alaska"     "Feb-21"  2   .4152749 "Legal"      
    "AK" "Alaska"     "Mar-21"  3  .55369985 "Legal"      
    "AK" "Alaska"     "Apr-21"  4   .5597183 "Legal"      
    "AK" "Alaska"     "May-21"  5  .58379227 "Legal"      
    "AK" "Alaska"     "Jun-21"  6   .6319401 "Legal"      
    "AK" "Alaska"     "Jul-21"  7   .5356444 "Legal"      
    "AK" "Alaska"     "Aug-21"  8   .5898107 "Legal"      
    "AK" "Alaska"     "Sep-21"  9  .55369985 "Legal"      
    "AK" "Alaska"     "Oct-21" 10   .6680509 "Legal"      
    "AK" "Alaska"     "Nov-21" 11   .6379585 "Legal"      
    "AK" "Alaska"     "Dec-21" 12     .90879 "Legal"      
    "AK" "Alaska"     "Jan-22" 13  1.1134182 "Legal"      
    "AK" "Alaska"     "Feb-22" 14  1.1073997 "Legal"      
    "AK" "Alaska"     "Mar-22" 15   1.239806 "Legal"      
    "AK" "Alaska"     "Apr-22" 16  1.1374921 "Legal"      
    "AK" "Alaska"     "May-22" 17  1.4203606 "Legal"      
    "AK" "Alaska"     "Jun-22" 18   1.402305 "Legal"      
    "AK" "Alaska"     "Jul-22" 19   1.943968 "Legal"      
    "AK" "Alaska"     "Aug-22" 20  2.0101712 "Legal"      
    "AK" "Alaska"     "Sep-22" 21  1.8115615 "Legal"      
    "AK" "Alaska"     "Oct-22" 22   1.576841 "Legal"      
    "AL" "Alabama"    "Jan-21"  1  .06930768 "Ban"        
    "AL" "Alabama"    "Feb-21"  2  .05850648 "Ban"        
    "AL" "Alabama"    "Mar-21"  3  .04770529 "Ban"        
    "AL" "Alabama"    "Apr-21"  4  .04320479 "Ban"        
    "AL" "Alabama"    "May-21"  5  .06570728 "Ban"        
    "AL" "Alabama"    "Jun-21"  6  .07740857 "Ban"        
    "AL" "Alabama"    "Jul-21"  7  .06210688 "Ban"        
    "AL" "Alabama"    "Aug-21"  8  .07380818 "Ban"        
    "AL" "Alabama"    "Sep-21"  9  .07470828 "Ban"        
    "AL" "Alabama"    "Oct-21" 10  .08280917 "Ban"        
    "AL" "Alabama"    "Nov-21" 11  .09541057 "Ban"        
    "AL" "Alabama"    "Dec-21" 12  .07380818 "Ban"        
    "AL" "Alabama"    "Jan-22" 13  .10621177 "Ban"        
    "AL" "Alabama"    "Feb-22" 14  .10441157 "Ban"        
    "AL" "Alabama"    "Mar-22" 15  .09631067 "Ban"        
    "AL" "Alabama"    "Apr-22" 16  .09631067 "Ban"        
    "AL" "Alabama"    "May-22" 17  .12511386 "Ban"        
    "AL" "Alabama"    "Jun-22" 18  .11611287 "Ban"        
    "AL" "Alabama"    "Jul-22" 19  .14041556 "Ban"        
    "AL" "Alabama"    "Aug-22" 20  .13231467 "Ban"        
    "AL" "Alabama"    "Sep-22" 21  .09000997 "Ban"        
    "AL" "Alabama"    "Oct-22" 22  .05580618 "Ban"        
    "AR" "Arkansas"   "Jan-21"  1 .008918127 "Ban"        
    "AR" "Arkansas"   "Feb-21"  2  .01932261 "Ban"        
    "AR" "Arkansas"   "Mar-21"  3  .03567251 "Ban"        
    "AR" "Arkansas"   "Apr-21"  4  .04013157 "Ban"        
    "AR" "Arkansas"   "May-21"  5  .03715886 "Ban"        
    "AR" "Arkansas"   "Jun-21"  6  .03715886 "Ban"        
    "AR" "Arkansas"   "Jul-21"  7  .05499512 "Ban"        
    "AR" "Arkansas"   "Aug-21"  8  .05648147 "Ban"        
    "AR" "Arkansas"   "Sep-21"  9   .1010721 "Ban"        
    "AR" "Arkansas"   "Oct-21" 10  .06094053 "Ban"        
    "AR" "Arkansas"   "Nov-21" 11  .24078943 "Ban"        
    "AR" "Arkansas"   "Dec-21" 12   .3760477 "Ban"        
    "AR" "Arkansas"   "Jan-22" 13  .53954667 "Ban"        
    "AR" "Arkansas"   "Feb-22" 14   .4117202 "Ban"        
    "AR" "Arkansas"   "Mar-22" 15   .2972709 "Ban"        
    "AR" "Arkansas"   "Apr-22" 16   .2155214 "Ban"        
    "AR" "Arkansas"   "May-22" 17    .306189 "Ban"        
    "AR" "Arkansas"   "Jun-22" 18  .28686643 "Ban"        
    "AR" "Arkansas"   "Jul-22" 19    .628728 "Ban"        
    "AR" "Arkansas"   "Aug-22" 20  .50090146 "Ban"        
    "AR" "Arkansas"   "Sep-22" 21   .6049463 "Ban"        
    "AR" "Arkansas"   "Oct-22" 22   .9170808 "Ban"        
    "AZ" "Arizona"    "Jan-21"  1   .1979357 "Partial Ban"
    "AZ" "Arizona"    "Feb-21"  2  .19981484 "Partial Ban"
    "AZ" "Arizona"    "Mar-21"  3  .19041917 "Partial Ban"
    "AZ" "Arizona"    "Apr-21"  4   .1271549 "Partial Ban"
    "AZ" "Arizona"    "May-21"  5  .14469351 "Partial Ban"
    "AZ" "Arizona"    "Jun-21"  6  .15471557 "Partial Ban"
    "AZ" "Arizona"    "Jul-21"  7  .13968247 "Partial Ban"
    "AZ" "Arizona"    "Aug-21"  8  .11650646 "Partial Ban"
    "AZ" "Arizona"    "Sep-21"  9  .13279231 "Partial Ban"
    "AZ" "Arizona"    "Oct-21" 10  .12527576 "Partial Ban"
    "AZ" "Arizona"    "Nov-21" 11  .16411126 "Partial Ban"
    "AZ" "Arizona"    "Dec-21" 12  .12966041 "Partial Ban"
    "AZ" "Arizona"    "Jan-22" 13  .17663883 "Partial Ban"
    "AZ" "Arizona"    "Feb-22" 14  .15471557 "Partial Ban"
    "AZ" "Arizona"    "Mar-22" 15   .1540892 "Partial Ban"
    "AZ" "Arizona"    "Apr-22" 16  .14030886 "Partial Ban"
    "AZ" "Arizona"    "May-22" 17   .1622321 "Partial Ban"
    "AZ" "Arizona"    "Jun-22" 18  .16348487 "Partial Ban"
    "AZ" "Arizona"    "Jul-22" 19  .22988103 "Partial Ban"
    "AZ" "Arizona"    "Aug-22" 20   .1929247 "Partial Ban"
    "AZ" "Arizona"    "Sep-22" 21  .12652852 "Partial Ban"
    "AZ" "Arizona"    "Oct-22" 22  .11149543 "Partial Ban"
    "CA" "California" "Jan-21"  1   .4868357 "Legal"      
    "CA" "California" "Feb-21"  2    .492948 "Legal"      
    "CA" "California" "Mar-21"  3  .55685854 "Legal"      
    "CA" "California" "Apr-21"  4  .51525235 "Legal"      
    "CA" "California" "May-21"  5   .4692496 "Legal"      
    "CA" "California" "Jun-21"  6   .4870502 "Legal"      
    "CA" "California" "Jul-21"  7  .50656646 "Legal"      
    "CA" "California" "Aug-21"  8   .4779354 "Legal"      
    "CA" "California" "Sep-21"  9   .4794367 "Legal"      
    "CA" "California" "Oct-21" 10   .4756835 "Legal"      
    "CA" "California" "Nov-21" 11   .4687134 "Legal"      
    "CA" "California" "Dec-21" 12   .4386883 "Legal"      
    end

  • #2
    stripplot from SSC could do something with these data, but you're already asking for 22 box plots which is going to look crowded. Wanting separate colours for separate states (and presumably an enormous legend too) is just going to produce an unintelligible mess. I would back up and reconsider what you most want to do with a graph or graphs.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      stripplot from SSC could do something with these data, but you're already asking for 22 box plots which is going to look crowded. Wanting separate colours for separate states (and presumably an enormous legend too) is just going to produce an unintelligible mess. I would back up and reconsider what you most want to do with a graph or graphs.
      Hi Dr. Cox,

      I intend to create 4 separate plots based on category. The values in the box and dots would correspond to the values taken on mean by statename if category=="Ban" etc. I reproduced an example using Tableau but my supervisor alas prefers me to use STATA to accomplish this feat. My first impression based on my review would be to use stripplot to combine the graph and dot plots then specify multiple options to generate different colors for each state, labels for outliers, and circle size.
      Capture.JPG

      Comment


      • #4
        Thanks for the detail. It seems that you are focusing on just 13 states, which simplifies the problem as compared with my imagining that it would be all 50 (plus DC? plus Puerto Rico? plus Guam? and so on).

        It is still going to be a struggle to do all that you want. It seems that you started out with the idea of a box plot and realised quickly that it didn't show anywhere near enough detail. so you decided to enhance it. I could suggest stripplot code but it would still not work well. I have to suggest that your use of colours doesn't work well, as there is far too much over-plotting and in any case the mix of colours violates most advice in including reds and greens.

        Logarithmic scale is surely a good and needed idea here. (If zeros are present, modifications are possible.)

        I would back up here and plot 13 time series as lines in a so-called front-and-back plot.

        https://journals.sagepub.com/doi/ful...6867X211025838 explains although https://www.statalist.org/forums/for...ailable-on-ssc gives you the flavour (some skimming and skipping is advised).

        Then you can if wished add plots of median and quartiles which are all that the box plots add if you plot all the data too. The code here show how it can be done with undocumented options of fabplot, but I am not convinced that the complication is worthwhile.

        The Grunfeld data here are comparable to your set-up in the sense that logarithmic scale is surely advised, but a bit simpler in having just 20 time points and 10 panels. The Grunfeld data come ordered by company size but in your case alphabetical order isn't obviously optimal. Howard Wainer parodied default to alphabetical order as Alabama first! See https://journals.sagepub.com/doi/pdf...6867X211045582 for more on that.

        What happened to Kentucky in April 2022? If it is just that the data stop then, that is going to be far clearer on a line plot.

        Code:
        webuse grunfeld, clear
        
        foreach q in 25 50 75 {
            egen p`q' = pctile(invest), p(`q') by(year)
        }
        
        fabplot line invest year, by(company) frontopts(lw(vthick))  ysc(log) needvars(p*) addplot(line p* year, lc(red ..) below sort) yla(1000 300 100 30 10 3 1, ang(h))
        Click image for larger version

Name:	yafabplot.png
Views:	1
Size:	87.9 KB
ID:	1692916


        See also https://www.statalist.org/forums/help#spelling.
        Last edited by Nick Cox; 11 Dec 2022, 02:56.

        Comment

        Working...
        X