Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Overlaying boxplot and markers for individual data points

    Hi,
    I am trying to generate a plot similar to the one in the link below. Essentially, it is the same data plotted as a box plot and dotplot, combined into one plot. Is there a simple way to generate this without using the twoway plot option.
    https://blogs.sas.com/content/graphi...our-box-plots/

    Thank you!
    Bose.

  • #2
    I think Nick Cox's -stripplot- command could do this work. I do not quite sure whether I use the command properly in the code below, however, it maybe an inspirable start.

    Code:
    sysuse auto
    gen select=int(uniform()*100)<=25
    replace foreign=2 if select==1
    label define origin 2 "Europe", modify
    
    egen meanW = mean(weight), by(foreign)
    stripplot weight, over(foreign) vertical jitter(10) mcolor(eltgreen) box iqr addplot(scatter meanW foreign)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	165.1 KB
ID:	1773944

    Comment


    • #3
      Or else use the cars.csv from SAS. However, I don't know why the plot from Stata is different from SAS:

      Code:
      import delimited "https://support.sas.com/documentation/onlinedoc/viya/exampledatasets/cars.csv"
      encode origin, gen(origin2)
      egen meanW = mean(weight), by(origin2)
      stripplot weight, over(origin2) vertical jitter(10) mcolor(eltgreen) box iqr addplot(scatter meanW origin2) box(blwidth(vthin))
      Click image for larger version

Name:	Graph.png
Views:	1
Size:	311.2 KB
ID:	1773946

      Comment


      • #4
        Chen Samulsion is bang on in terms of getting close to the graph specified. stripplot (from SSC) could be used to get such plots, although the jittering capacity is just inherited from official command scatter and neither implemented nor encouraged by me as author of stripplot.

        There are at least two problems with this kind of plot in my view. The point of jittering is just to shake apart data points that would otherwise be over-plotted or at least too easily confused. I can't be the only one who learned or perhaps even re-invented a technique of just separating identical or very close data points into small clusters when hand plotting with pen and paper in secondary (high) school. Applied to scatter plots in contemporary software, that can be a useful small trick.

        When implemented in software with some random-number function doing the work internally, there is no control in Stata over randomness beyond specifying the degree of jittering. I can't speak for software other than Stata but I wouldn't be surprised at similar lack of control elsewhere. A good defence would be that if jittering is wanted and helpful, almost any example is about as good as any other, as an implication of randomness. But I call this

        Problem 1. Lack of reproducibility. You can't easily reproduce the same plot, either within particular software or across different software. I guess this is what Chen is driving at. I don't mind anyone saying they don't care, but at a minimum this can puzzle naive or beginning users.

        A much bigger problem in my view is why anyone would think this was a good method for showing univariate distributions. Despite better solutions being available for a century and more, these plots -- sometimes called jitter plots or jitterplots -- seem more popular recently across various fields, which seems to need memetics rather than statistics for an explanation. What is this all about? My program can do this cute thing, add random noise? You tell me.

        Problem 2. By solving one problem -- possible over-plotting -- jittering of univariate distributions creates another problem, the need for the reader of the graph to appreciate and undo the jittering by locally counting or averaging in your head. Worse, jittering obscures local stripes or gaps in the data that may be important. It can be fine to regard such local structure as trivial or uninteresting but jittering doesn't let you see them in the first place. I agree with anyone who says that if they want a smooth density estimate they know where to look. But if the interest is in an exploratory or descriptive plot that both shows broad structure and fine detail you may need to think about, jitter plots (with or without added boxes or extra means) seem inefficient and also unattractive to me.

        NB: The plots in #1 #2 #3 all put means as extra data points in the middle of each display. That isn't hard to explain, but it does need to be explained.

        See https://www.statalist.org/forums/for...ercentile-sets as relevant to what follows. pctilesets from SSC isn't essential to that. You can get what it produces with egen if you prefer.

        Here is an attempt at something that conveys the same information more readably. It's not a one-line solution: the deeper point is that you have the tools to customise your own plots.

        The goal may seem elusive but I think it can often be achieved: a univariate distribution graph that shows both broad features and small structure that can be easily and effectively decoded. In this case, ranking does the job that jittering is aimed to do, and does it better.

        Detail: If you're showing all the data points, you really don't need arbitrary rules for whisker length, which in my experience are rarely explained well or even correctly -- textbook after textbook makes a complete mess of it -- and which are rarely understood by readers or even by authors of papers.

        qplot is from the Stata Journal.

        Code:
        import delimited "https://support.sas.com/documentation/onlinedoc/viya/exampledatasets/cars.csv", clear 
        
        pctilesets weight, over(origin) pctile(25 50 75) min max saving(summary, replace)
        
        clonevar origgvar=origin 
        
        merge m:1 origgvar using summary 
        
        gen where = 1.05
        
        egen mean = mean(weight), by(origin)
        
        bysort origin : gen x = cond(_n==1, 0, cond(_n == _N, 1, .)) 
        
        qplot weight, by(origin, note("") row(1) legend(off)) xla(0 "0" 0.25 "0.25" 0.5 "0.5" 0.75 "0.75" 1 "1") addplot(rbar p25 p50 where, barw(0.06) lcol(black) fcolor(none) || rbar p50 p75 where, barw(0.06) lcolor(black) fcolor(none) || rspike p75 max where, lcol(black) || rspike p25 min where, lcol(black) || line mean x, lcol(stc2)) ytitle(Weight (pounds)) xtitle(Fraction of data) yla(2000(1000)7000)
        Click image for larger version

Name:	qbox_another.png
Views:	1
Size:	69.2 KB
ID:	1773956




        Comment


        • #5
          What is this all about? My program can do this cute thing, add random noise? You tell me.
          Hi Nick Cox Sorry to mimic jitter plot using your stripplot.
          Bose Kochupurakkal You could compare and choose what you like when exhibit univariate distribution.
          Click image for larger version

Name:	01-Comparing-Strip-Plots-to-Different-Data-Visualizations.png
Views:	1
Size:	51.7 KB
ID:	1773959


          https://statisticseasily.com/glossar...l%20variables.
          https://datagy.io/seaborn-stripplot/
          Last edited by Chen Samulsion; 06 Mar 2025, 04:18.

          Comment


          • #6
            Chen Samulsion Nothing to apologise for there. You gave an excellent answer to the question. I am just trying to broaden the discussion.
            Last edited by Nick Cox; 06 Mar 2025, 04:42.

            Comment


            • #7
              This is an extension of #4. It's an accident of the English language that principle and prejudice are pretty close in a dictionary. It's not an accident that one person's principles can seem like someone else's prejudices to anyone else. So, if you like, here some more prejudices from me, and see how far you agree.

              I really want any descriptive or exploratory plot -- indeed any statistical plot whatsoever -- to show outliers clearly. My reasons are standard: I want to detect data points that are mistaken or implausible and I want to check whether genuine outliers should lead me to modify my intended analysis. In doing that I don't have a definition of outliers in my head, any more than I need a definition of nice or nasty to implement my preference for nice over nasty.

              Jittering jitters outliers too. A benign reading of that is that it doesn't matter. If a jittered outlier looks like a typical data point, it really wasn't an outlier; otherwise you should still be able to plot it. Other way round, perhaps jittering will make some points look like outliers which aren't.

              That is, I can't see that jittering outliers is a feature. To me it is another thing getting in the way of appreciating outliers simply and directly.

              In terms of #5 the strip plot and scatter plot are really the same idea with cosmetic differences. A bare box plot can conceal as much as it reveals. I suppose violin plots tell you more than bare box plots. My objection to violin plots is mostly over how they are used in practice. In my reading no one ever explains exactly how the density estimates were produced (e.g. kernel type and bandwidth) or discusses how much sensitivity analysis they did to choose the version they are presenting. (Literature references documenting exceptions would be excellent.) It's surely a cardinal principle that if you present a smooth of any kind, the raw data should remain if not visible, then accessible.

              Ranking imparts as much smoothing as you need for most purposes, and it's easy to smooth a little more mentally.

              Comment


              • #8
                Yet more... The dataset used in #5 is recognisably the Palmer penguins data, or one version of it. It's by present standards compact enough to be used easily and may be used as a sandbox for various kinds of play. Here's a Stata version:

                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input int id str9(species island) double(bill_length bill_depth) int(flipper_length body_mass) str6 sex byte female int year
                  1 "Adelie"    "Torgersen" 39.1 18.7 181 3750 "male"    0 2007
                  2 "Adelie"    "Torgersen" 39.5 17.4 186 3800 "female"  1 2007
                  3 "Adelie"    "Torgersen" 40.3   18 195 3250 "female"  1 2007
                  4 "Adelie"    "Torgersen"    .    .   .    . "NA"     .a 2007
                  5 "Adelie"    "Torgersen" 36.7 19.3 193 3450 "female"  1 2007
                  6 "Adelie"    "Torgersen" 39.3 20.6 190 3650 "male"    0 2007
                  7 "Adelie"    "Torgersen" 38.9 17.8 181 3625 "female"  1 2007
                  8 "Adelie"    "Torgersen" 39.2 19.6 195 4675 "male"    0 2007
                  9 "Adelie"    "Torgersen" 34.1 18.1 193 3475 "NA"     .a 2007
                 10 "Adelie"    "Torgersen"   42 20.2 190 4250 "NA"     .a 2007
                 11 "Adelie"    "Torgersen" 37.8 17.1 186 3300 "NA"     .a 2007
                 12 "Adelie"    "Torgersen" 37.8 17.3 180 3700 "NA"     .a 2007
                 13 "Adelie"    "Torgersen" 41.1 17.6 182 3200 "female"  1 2007
                 14 "Adelie"    "Torgersen" 38.6 21.2 191 3800 "male"    0 2007
                 15 "Adelie"    "Torgersen" 34.6 21.1 198 4400 "male"    0 2007
                 16 "Adelie"    "Torgersen" 36.6 17.8 185 3700 "female"  1 2007
                 17 "Adelie"    "Torgersen" 38.7   19 195 3450 "female"  1 2007
                 18 "Adelie"    "Torgersen" 42.5 20.7 197 4500 "male"    0 2007
                 19 "Adelie"    "Torgersen" 34.4 18.4 184 3325 "female"  1 2007
                 20 "Adelie"    "Torgersen"   46 21.5 194 4200 "male"    0 2007
                 21 "Adelie"    "Biscoe"    37.8 18.3 174 3400 "female"  1 2007
                 22 "Adelie"    "Biscoe"    37.7 18.7 180 3600 "male"    0 2007
                 23 "Adelie"    "Biscoe"    35.9 19.2 189 3800 "female"  1 2007
                 24 "Adelie"    "Biscoe"    38.2 18.1 185 3950 "male"    0 2007
                 25 "Adelie"    "Biscoe"    38.8 17.2 180 3800 "male"    0 2007
                 26 "Adelie"    "Biscoe"    35.3 18.9 187 3800 "female"  1 2007
                 27 "Adelie"    "Biscoe"    40.6 18.6 183 3550 "male"    0 2007
                 28 "Adelie"    "Biscoe"    40.5 17.9 187 3200 "female"  1 2007
                 29 "Adelie"    "Biscoe"    37.9 18.6 172 3150 "female"  1 2007
                 30 "Adelie"    "Biscoe"    40.5 18.9 180 3950 "male"    0 2007
                 31 "Adelie"    "Dream"     39.5 16.7 178 3250 "female"  1 2007
                 32 "Adelie"    "Dream"     37.2 18.1 178 3900 "male"    0 2007
                 33 "Adelie"    "Dream"     39.5 17.8 188 3300 "female"  1 2007
                 34 "Adelie"    "Dream"     40.9 18.9 184 3900 "male"    0 2007
                 35 "Adelie"    "Dream"     36.4   17 195 3325 "female"  1 2007
                 36 "Adelie"    "Dream"     39.2 21.1 196 4150 "male"    0 2007
                 37 "Adelie"    "Dream"     38.8   20 190 3950 "male"    0 2007
                 38 "Adelie"    "Dream"     42.2 18.5 180 3550 "female"  1 2007
                 39 "Adelie"    "Dream"     37.6 19.3 181 3300 "female"  1 2007
                 40 "Adelie"    "Dream"     39.8 19.1 184 4650 "male"    0 2007
                 41 "Adelie"    "Dream"     36.5   18 182 3150 "female"  1 2007
                 42 "Adelie"    "Dream"     40.8 18.4 195 3900 "male"    0 2007
                 43 "Adelie"    "Dream"       36 18.5 186 3100 "female"  1 2007
                 44 "Adelie"    "Dream"     44.1 19.7 196 4400 "male"    0 2007
                 45 "Adelie"    "Dream"       37 16.9 185 3000 "female"  1 2007
                 46 "Adelie"    "Dream"     39.6 18.8 190 4600 "male"    0 2007
                 47 "Adelie"    "Dream"     41.1   19 182 3425 "male"    0 2007
                 48 "Adelie"    "Dream"     37.5 18.9 179 2975 "NA"     .a 2007
                 49 "Adelie"    "Dream"       36 17.9 190 3450 "female"  1 2007
                 50 "Adelie"    "Dream"     42.3 21.2 191 4150 "male"    0 2007
                 51 "Adelie"    "Biscoe"    39.6 17.7 186 3500 "female"  1 2008
                 52 "Adelie"    "Biscoe"    40.1 18.9 188 4300 "male"    0 2008
                 53 "Adelie"    "Biscoe"      35 17.9 190 3450 "female"  1 2008
                 54 "Adelie"    "Biscoe"      42 19.5 200 4050 "male"    0 2008
                 55 "Adelie"    "Biscoe"    34.5 18.1 187 2900 "female"  1 2008
                 56 "Adelie"    "Biscoe"    41.4 18.6 191 3700 "male"    0 2008
                 57 "Adelie"    "Biscoe"      39 17.5 186 3550 "female"  1 2008
                 58 "Adelie"    "Biscoe"    40.6 18.8 193 3800 "male"    0 2008
                 59 "Adelie"    "Biscoe"    36.5 16.6 181 2850 "female"  1 2008
                 60 "Adelie"    "Biscoe"    37.6 19.1 194 3750 "male"    0 2008
                 61 "Adelie"    "Biscoe"    35.7 16.9 185 3150 "female"  1 2008
                 62 "Adelie"    "Biscoe"    41.3 21.1 195 4400 "male"    0 2008
                 63 "Adelie"    "Biscoe"    37.6   17 185 3600 "female"  1 2008
                 64 "Adelie"    "Biscoe"    41.1 18.2 192 4050 "male"    0 2008
                 65 "Adelie"    "Biscoe"    36.4 17.1 184 2850 "female"  1 2008
                 66 "Adelie"    "Biscoe"    41.6   18 192 3950 "male"    0 2008
                 67 "Adelie"    "Biscoe"    35.5 16.2 195 3350 "female"  1 2008
                 68 "Adelie"    "Biscoe"    41.1 19.1 188 4100 "male"    0 2008
                 69 "Adelie"    "Torgersen" 35.9 16.6 190 3050 "female"  1 2008
                 70 "Adelie"    "Torgersen" 41.8 19.4 198 4450 "male"    0 2008
                 71 "Adelie"    "Torgersen" 33.5   19 190 3600 "female"  1 2008
                 72 "Adelie"    "Torgersen" 39.7 18.4 190 3900 "male"    0 2008
                 73 "Adelie"    "Torgersen" 39.6 17.2 196 3550 "female"  1 2008
                 74 "Adelie"    "Torgersen" 45.8 18.9 197 4150 "male"    0 2008
                 75 "Adelie"    "Torgersen" 35.5 17.5 190 3700 "female"  1 2008
                 76 "Adelie"    "Torgersen" 42.8 18.5 195 4250 "male"    0 2008
                 77 "Adelie"    "Torgersen" 40.9 16.8 191 3700 "female"  1 2008
                 78 "Adelie"    "Torgersen" 37.2 19.4 184 3900 "male"    0 2008
                 79 "Adelie"    "Torgersen" 36.2 16.1 187 3550 "female"  1 2008
                 80 "Adelie"    "Torgersen" 42.1 19.1 195 4000 "male"    0 2008
                 81 "Adelie"    "Torgersen" 34.6 17.2 189 3200 "female"  1 2008
                 82 "Adelie"    "Torgersen" 42.9 17.6 196 4700 "male"    0 2008
                 83 "Adelie"    "Torgersen" 36.7 18.8 187 3800 "female"  1 2008
                 84 "Adelie"    "Torgersen" 35.1 19.4 193 4200 "male"    0 2008
                 85 "Adelie"    "Dream"     37.3 17.8 191 3350 "female"  1 2008
                 86 "Adelie"    "Dream"     41.3 20.3 194 3550 "male"    0 2008
                 87 "Adelie"    "Dream"     36.3 19.5 190 3800 "male"    0 2008
                 88 "Adelie"    "Dream"     36.9 18.6 189 3500 "female"  1 2008
                 89 "Adelie"    "Dream"     38.3 19.2 189 3950 "male"    0 2008
                 90 "Adelie"    "Dream"     38.9 18.8 190 3600 "female"  1 2008
                 91 "Adelie"    "Dream"     35.7   18 202 3550 "female"  1 2008
                 92 "Adelie"    "Dream"     41.1 18.1 205 4300 "male"    0 2008
                 93 "Adelie"    "Dream"       34 17.1 185 3400 "female"  1 2008
                 94 "Adelie"    "Dream"     39.6 18.1 186 4450 "male"    0 2008
                 95 "Adelie"    "Dream"     36.2 17.3 187 3300 "female"  1 2008
                 96 "Adelie"    "Dream"     40.8 18.9 208 4300 "male"    0 2008
                 97 "Adelie"    "Dream"     38.1 18.6 190 3700 "female"  1 2008
                 98 "Adelie"    "Dream"     40.3 18.5 196 4350 "male"    0 2008
                 99 "Adelie"    "Dream"     33.1 16.1 178 2900 "female"  1 2008
                100 "Adelie"    "Dream"     43.2 18.5 192 4100 "male"    0 2008
                101 "Adelie"    "Biscoe"      35 17.9 192 3725 "female"  1 2009
                102 "Adelie"    "Biscoe"      41   20 203 4725 "male"    0 2009
                103 "Adelie"    "Biscoe"    37.7   16 183 3075 "female"  1 2009
                104 "Adelie"    "Biscoe"    37.8   20 190 4250 "male"    0 2009
                105 "Adelie"    "Biscoe"    37.9 18.6 193 2925 "female"  1 2009
                106 "Adelie"    "Biscoe"    39.7 18.9 184 3550 "male"    0 2009
                107 "Adelie"    "Biscoe"    38.6 17.2 199 3750 "female"  1 2009
                108 "Adelie"    "Biscoe"    38.2   20 190 3900 "male"    0 2009
                109 "Adelie"    "Biscoe"    38.1   17 181 3175 "female"  1 2009
                110 "Adelie"    "Biscoe"    43.2   19 197 4775 "male"    0 2009
                111 "Adelie"    "Biscoe"    38.1 16.5 198 3825 "female"  1 2009
                112 "Adelie"    "Biscoe"    45.6 20.3 191 4600 "male"    0 2009
                113 "Adelie"    "Biscoe"    39.7 17.7 193 3200 "female"  1 2009
                114 "Adelie"    "Biscoe"    42.2 19.5 197 4275 "male"    0 2009
                115 "Adelie"    "Biscoe"    39.6 20.7 191 3900 "female"  1 2009
                116 "Adelie"    "Biscoe"    42.7 18.3 196 4075 "male"    0 2009
                117 "Adelie"    "Torgersen" 38.6   17 188 2900 "female"  1 2009
                118 "Adelie"    "Torgersen" 37.3 20.5 199 3775 "male"    0 2009
                119 "Adelie"    "Torgersen" 35.7   17 189 3350 "female"  1 2009
                120 "Adelie"    "Torgersen" 41.1 18.6 189 3325 "male"    0 2009
                121 "Adelie"    "Torgersen" 36.2 17.2 187 3150 "female"  1 2009
                122 "Adelie"    "Torgersen" 37.7 19.8 198 3500 "male"    0 2009
                123 "Adelie"    "Torgersen" 40.2   17 176 3450 "female"  1 2009
                124 "Adelie"    "Torgersen" 41.4 18.5 202 3875 "male"    0 2009
                125 "Adelie"    "Torgersen" 35.2 15.9 186 3050 "female"  1 2009
                126 "Adelie"    "Torgersen" 40.6   19 199 4000 "male"    0 2009
                127 "Adelie"    "Torgersen" 38.8 17.6 191 3275 "female"  1 2009
                128 "Adelie"    "Torgersen" 41.5 18.3 195 4300 "male"    0 2009
                129 "Adelie"    "Torgersen"   39 17.1 191 3050 "female"  1 2009
                130 "Adelie"    "Torgersen" 44.1   18 210 4000 "male"    0 2009
                131 "Adelie"    "Torgersen" 38.5 17.9 190 3325 "female"  1 2009
                132 "Adelie"    "Torgersen" 43.1 19.2 197 3500 "male"    0 2009
                133 "Adelie"    "Dream"     36.8 18.5 193 3500 "female"  1 2009
                134 "Adelie"    "Dream"     37.5 18.5 199 4475 "male"    0 2009
                135 "Adelie"    "Dream"     38.1 17.6 187 3425 "female"  1 2009
                136 "Adelie"    "Dream"     41.1 17.5 190 3900 "male"    0 2009
                137 "Adelie"    "Dream"     35.6 17.5 191 3175 "female"  1 2009
                138 "Adelie"    "Dream"     40.2 20.1 200 3975 "male"    0 2009
                139 "Adelie"    "Dream"       37 16.5 185 3400 "female"  1 2009
                140 "Adelie"    "Dream"     39.7 17.9 193 4250 "male"    0 2009
                141 "Adelie"    "Dream"     40.2 17.1 193 3400 "female"  1 2009
                142 "Adelie"    "Dream"     40.6 17.2 187 3475 "male"    0 2009
                143 "Adelie"    "Dream"     32.1 15.5 188 3050 "female"  1 2009
                144 "Adelie"    "Dream"     40.7   17 190 3725 "male"    0 2009
                145 "Adelie"    "Dream"     37.3 16.8 192 3000 "female"  1 2009
                146 "Adelie"    "Dream"       39 18.7 185 3650 "male"    0 2009
                147 "Adelie"    "Dream"     39.2 18.6 190 4250 "male"    0 2009
                148 "Adelie"    "Dream"     36.6 18.4 184 3475 "female"  1 2009
                149 "Adelie"    "Dream"       36 17.8 195 3450 "female"  1 2009
                150 "Adelie"    "Dream"     37.8 18.1 193 3750 "male"    0 2009
                151 "Adelie"    "Dream"       36 17.1 187 3700 "female"  1 2009
                152 "Adelie"    "Dream"     41.5 18.5 201 4000 "male"    0 2009
                153 "Gentoo"    "Biscoe"    46.1 13.2 211 4500 "female"  1 2007
                154 "Gentoo"    "Biscoe"      50 16.3 230 5700 "male"    0 2007
                155 "Gentoo"    "Biscoe"    48.7 14.1 210 4450 "female"  1 2007
                156 "Gentoo"    "Biscoe"      50 15.2 218 5700 "male"    0 2007
                157 "Gentoo"    "Biscoe"    47.6 14.5 215 5400 "male"    0 2007
                158 "Gentoo"    "Biscoe"    46.5 13.5 210 4550 "female"  1 2007
                159 "Gentoo"    "Biscoe"    45.4 14.6 211 4800 "female"  1 2007
                160 "Gentoo"    "Biscoe"    46.7 15.3 219 5200 "male"    0 2007
                161 "Gentoo"    "Biscoe"    43.3 13.4 209 4400 "female"  1 2007
                162 "Gentoo"    "Biscoe"    46.8 15.4 215 5150 "male"    0 2007
                163 "Gentoo"    "Biscoe"    40.9 13.7 214 4650 "female"  1 2007
                164 "Gentoo"    "Biscoe"      49 16.1 216 5550 "male"    0 2007
                165 "Gentoo"    "Biscoe"    45.5 13.7 214 4650 "female"  1 2007
                166 "Gentoo"    "Biscoe"    48.4 14.6 213 5850 "male"    0 2007
                167 "Gentoo"    "Biscoe"    45.8 14.6 210 4200 "female"  1 2007
                168 "Gentoo"    "Biscoe"    49.3 15.7 217 5850 "male"    0 2007
                169 "Gentoo"    "Biscoe"      42 13.5 210 4150 "female"  1 2007
                170 "Gentoo"    "Biscoe"    49.2 15.2 221 6300 "male"    0 2007
                171 "Gentoo"    "Biscoe"    46.2 14.5 209 4800 "female"  1 2007
                172 "Gentoo"    "Biscoe"    48.7 15.1 222 5350 "male"    0 2007
                173 "Gentoo"    "Biscoe"    50.2 14.3 218 5700 "male"    0 2007
                174 "Gentoo"    "Biscoe"    45.1 14.5 215 5000 "female"  1 2007
                175 "Gentoo"    "Biscoe"    46.5 14.5 213 4400 "female"  1 2007
                176 "Gentoo"    "Biscoe"    46.3 15.8 215 5050 "male"    0 2007
                177 "Gentoo"    "Biscoe"    42.9 13.1 215 5000 "female"  1 2007
                178 "Gentoo"    "Biscoe"    46.1 15.1 215 5100 "male"    0 2007
                179 "Gentoo"    "Biscoe"    44.5 14.3 216 4100 "NA"     .a 2007
                180 "Gentoo"    "Biscoe"    47.8   15 215 5650 "male"    0 2007
                181 "Gentoo"    "Biscoe"    48.2 14.3 210 4600 "female"  1 2007
                182 "Gentoo"    "Biscoe"      50 15.3 220 5550 "male"    0 2007
                183 "Gentoo"    "Biscoe"    47.3 15.3 222 5250 "male"    0 2007
                184 "Gentoo"    "Biscoe"    42.8 14.2 209 4700 "female"  1 2007
                185 "Gentoo"    "Biscoe"    45.1 14.5 207 5050 "female"  1 2007
                186 "Gentoo"    "Biscoe"    59.6   17 230 6050 "male"    0 2007
                187 "Gentoo"    "Biscoe"    49.1 14.8 220 5150 "female"  1 2008
                188 "Gentoo"    "Biscoe"    48.4 16.3 220 5400 "male"    0 2008
                189 "Gentoo"    "Biscoe"    42.6 13.7 213 4950 "female"  1 2008
                190 "Gentoo"    "Biscoe"    44.4 17.3 219 5250 "male"    0 2008
                191 "Gentoo"    "Biscoe"      44 13.6 208 4350 "female"  1 2008
                192 "Gentoo"    "Biscoe"    48.7 15.7 208 5350 "male"    0 2008
                193 "Gentoo"    "Biscoe"    42.7 13.7 208 3950 "female"  1 2008
                194 "Gentoo"    "Biscoe"    49.6   16 225 5700 "male"    0 2008
                195 "Gentoo"    "Biscoe"    45.3 13.7 210 4300 "female"  1 2008
                196 "Gentoo"    "Biscoe"    49.6   15 216 4750 "male"    0 2008
                197 "Gentoo"    "Biscoe"    50.5 15.9 222 5550 "male"    0 2008
                198 "Gentoo"    "Biscoe"    43.6 13.9 217 4900 "female"  1 2008
                199 "Gentoo"    "Biscoe"    45.5 13.9 210 4200 "female"  1 2008
                200 "Gentoo"    "Biscoe"    50.5 15.9 225 5400 "male"    0 2008
                201 "Gentoo"    "Biscoe"    44.9 13.3 213 5100 "female"  1 2008
                202 "Gentoo"    "Biscoe"    45.2 15.8 215 5300 "male"    0 2008
                203 "Gentoo"    "Biscoe"    46.6 14.2 210 4850 "female"  1 2008
                204 "Gentoo"    "Biscoe"    48.5 14.1 220 5300 "male"    0 2008
                205 "Gentoo"    "Biscoe"    45.1 14.4 210 4400 "female"  1 2008
                206 "Gentoo"    "Biscoe"    50.1   15 225 5000 "male"    0 2008
                207 "Gentoo"    "Biscoe"    46.5 14.4 217 4900 "female"  1 2008
                208 "Gentoo"    "Biscoe"      45 15.4 220 5050 "male"    0 2008
                209 "Gentoo"    "Biscoe"    43.8 13.9 208 4300 "female"  1 2008
                210 "Gentoo"    "Biscoe"    45.5   15 220 5000 "male"    0 2008
                211 "Gentoo"    "Biscoe"    43.2 14.5 208 4450 "female"  1 2008
                212 "Gentoo"    "Biscoe"    50.4 15.3 224 5550 "male"    0 2008
                213 "Gentoo"    "Biscoe"    45.3 13.8 208 4200 "female"  1 2008
                214 "Gentoo"    "Biscoe"    46.2 14.9 221 5300 "male"    0 2008
                215 "Gentoo"    "Biscoe"    45.7 13.9 214 4400 "female"  1 2008
                216 "Gentoo"    "Biscoe"    54.3 15.7 231 5650 "male"    0 2008
                217 "Gentoo"    "Biscoe"    45.8 14.2 219 4700 "female"  1 2008
                218 "Gentoo"    "Biscoe"    49.8 16.8 230 5700 "male"    0 2008
                219 "Gentoo"    "Biscoe"    46.2 14.4 214 4650 "NA"     .a 2008
                220 "Gentoo"    "Biscoe"    49.5 16.2 229 5800 "male"    0 2008
                221 "Gentoo"    "Biscoe"    43.5 14.2 220 4700 "female"  1 2008
                222 "Gentoo"    "Biscoe"    50.7   15 223 5550 "male"    0 2008
                223 "Gentoo"    "Biscoe"    47.7   15 216 4750 "female"  1 2008
                224 "Gentoo"    "Biscoe"    46.4 15.6 221 5000 "male"    0 2008
                225 "Gentoo"    "Biscoe"    48.2 15.6 221 5100 "male"    0 2008
                226 "Gentoo"    "Biscoe"    46.5 14.8 217 5200 "female"  1 2008
                227 "Gentoo"    "Biscoe"    46.4   15 216 4700 "female"  1 2008
                228 "Gentoo"    "Biscoe"    48.6   16 230 5800 "male"    0 2008
                229 "Gentoo"    "Biscoe"    47.5 14.2 209 4600 "female"  1 2008
                230 "Gentoo"    "Biscoe"    51.1 16.3 220 6000 "male"    0 2008
                231 "Gentoo"    "Biscoe"    45.2 13.8 215 4750 "female"  1 2008
                232 "Gentoo"    "Biscoe"    45.2 16.4 223 5950 "male"    0 2008
                233 "Gentoo"    "Biscoe"    49.1 14.5 212 4625 "female"  1 2009
                234 "Gentoo"    "Biscoe"    52.5 15.6 221 5450 "male"    0 2009
                235 "Gentoo"    "Biscoe"    47.4 14.6 212 4725 "female"  1 2009
                236 "Gentoo"    "Biscoe"      50 15.9 224 5350 "male"    0 2009
                237 "Gentoo"    "Biscoe"    44.9 13.8 212 4750 "female"  1 2009
                238 "Gentoo"    "Biscoe"    50.8 17.3 228 5600 "male"    0 2009
                239 "Gentoo"    "Biscoe"    43.4 14.4 218 4600 "female"  1 2009
                240 "Gentoo"    "Biscoe"    51.3 14.2 218 5300 "male"    0 2009
                241 "Gentoo"    "Biscoe"    47.5   14 212 4875 "female"  1 2009
                242 "Gentoo"    "Biscoe"    52.1   17 230 5550 "male"    0 2009
                243 "Gentoo"    "Biscoe"    47.5   15 218 4950 "female"  1 2009
                244 "Gentoo"    "Biscoe"    52.2 17.1 228 5400 "male"    0 2009
                245 "Gentoo"    "Biscoe"    45.5 14.5 212 4750 "female"  1 2009
                246 "Gentoo"    "Biscoe"    49.5 16.1 224 5650 "male"    0 2009
                247 "Gentoo"    "Biscoe"    44.5 14.7 214 4850 "female"  1 2009
                248 "Gentoo"    "Biscoe"    50.8 15.7 226 5200 "male"    0 2009
                249 "Gentoo"    "Biscoe"    49.4 15.8 216 4925 "male"    0 2009
                250 "Gentoo"    "Biscoe"    46.9 14.6 222 4875 "female"  1 2009
                251 "Gentoo"    "Biscoe"    48.4 14.4 203 4625 "female"  1 2009
                252 "Gentoo"    "Biscoe"    51.1 16.5 225 5250 "male"    0 2009
                253 "Gentoo"    "Biscoe"    48.5   15 219 4850 "female"  1 2009
                254 "Gentoo"    "Biscoe"    55.9   17 228 5600 "male"    0 2009
                255 "Gentoo"    "Biscoe"    47.2 15.5 215 4975 "female"  1 2009
                256 "Gentoo"    "Biscoe"    49.1   15 228 5500 "male"    0 2009
                257 "Gentoo"    "Biscoe"    47.3 13.8 216 4725 "NA"     .a 2009
                258 "Gentoo"    "Biscoe"    46.8 16.1 215 5500 "male"    0 2009
                259 "Gentoo"    "Biscoe"    41.7 14.7 210 4700 "female"  1 2009
                260 "Gentoo"    "Biscoe"    53.4 15.8 219 5500 "male"    0 2009
                261 "Gentoo"    "Biscoe"    43.3   14 208 4575 "female"  1 2009
                262 "Gentoo"    "Biscoe"    48.1 15.1 209 5500 "male"    0 2009
                263 "Gentoo"    "Biscoe"    50.5 15.2 216 5000 "female"  1 2009
                264 "Gentoo"    "Biscoe"    49.8 15.9 229 5950 "male"    0 2009
                265 "Gentoo"    "Biscoe"    43.5 15.2 213 4650 "female"  1 2009
                266 "Gentoo"    "Biscoe"    51.5 16.3 230 5500 "male"    0 2009
                267 "Gentoo"    "Biscoe"    46.2 14.1 217 4375 "female"  1 2009
                268 "Gentoo"    "Biscoe"    55.1   16 230 5850 "male"    0 2009
                269 "Gentoo"    "Biscoe"    44.5 15.7 217 4875 "NA"     .a 2009
                270 "Gentoo"    "Biscoe"    48.8 16.2 222 6000 "male"    0 2009
                271 "Gentoo"    "Biscoe"    47.2 13.7 214 4925 "female"  1 2009
                272 "Gentoo"    "Biscoe"       .    .   .    . "NA"     .a 2009
                273 "Gentoo"    "Biscoe"    46.8 14.3 215 4850 "female"  1 2009
                274 "Gentoo"    "Biscoe"    50.4 15.7 222 5750 "male"    0 2009
                275 "Gentoo"    "Biscoe"    45.2 14.8 212 5200 "female"  1 2009
                276 "Gentoo"    "Biscoe"    49.9 16.1 213 5400 "male"    0 2009
                277 "Chinstrap" "Dream"     46.5 17.9 192 3500 "female"  1 2007
                278 "Chinstrap" "Dream"       50 19.5 196 3900 "male"    0 2007
                279 "Chinstrap" "Dream"     51.3 19.2 193 3650 "male"    0 2007
                280 "Chinstrap" "Dream"     45.4 18.7 188 3525 "female"  1 2007
                281 "Chinstrap" "Dream"     52.7 19.8 197 3725 "male"    0 2007
                282 "Chinstrap" "Dream"     45.2 17.8 198 3950 "female"  1 2007
                283 "Chinstrap" "Dream"     46.1 18.2 178 3250 "female"  1 2007
                284 "Chinstrap" "Dream"     51.3 18.2 197 3750 "male"    0 2007
                285 "Chinstrap" "Dream"       46 18.9 195 4150 "female"  1 2007
                286 "Chinstrap" "Dream"     51.3 19.9 198 3700 "male"    0 2007
                287 "Chinstrap" "Dream"     46.6 17.8 193 3800 "female"  1 2007
                288 "Chinstrap" "Dream"     51.7 20.3 194 3775 "male"    0 2007
                289 "Chinstrap" "Dream"       47 17.3 185 3700 "female"  1 2007
                290 "Chinstrap" "Dream"       52 18.1 201 4050 "male"    0 2007
                291 "Chinstrap" "Dream"     45.9 17.1 190 3575 "female"  1 2007
                292 "Chinstrap" "Dream"     50.5 19.6 201 4050 "male"    0 2007
                293 "Chinstrap" "Dream"     50.3   20 197 3300 "male"    0 2007
                294 "Chinstrap" "Dream"       58 17.8 181 3700 "female"  1 2007
                295 "Chinstrap" "Dream"     46.4 18.6 190 3450 "female"  1 2007
                296 "Chinstrap" "Dream"     49.2 18.2 195 4400 "male"    0 2007
                297 "Chinstrap" "Dream"     42.4 17.3 181 3600 "female"  1 2007
                298 "Chinstrap" "Dream"     48.5 17.5 191 3400 "male"    0 2007
                299 "Chinstrap" "Dream"     43.2 16.6 187 2900 "female"  1 2007
                300 "Chinstrap" "Dream"     50.6 19.4 193 3800 "male"    0 2007
                301 "Chinstrap" "Dream"     46.7 17.9 195 3300 "female"  1 2007
                302 "Chinstrap" "Dream"       52   19 197 4150 "male"    0 2007
                303 "Chinstrap" "Dream"     50.5 18.4 200 3400 "female"  1 2008
                304 "Chinstrap" "Dream"     49.5   19 200 3800 "male"    0 2008
                305 "Chinstrap" "Dream"     46.4 17.8 191 3700 "female"  1 2008
                306 "Chinstrap" "Dream"     52.8   20 205 4550 "male"    0 2008
                307 "Chinstrap" "Dream"     40.9 16.6 187 3200 "female"  1 2008
                308 "Chinstrap" "Dream"     54.2 20.8 201 4300 "male"    0 2008
                309 "Chinstrap" "Dream"     42.5 16.7 187 3350 "female"  1 2008
                310 "Chinstrap" "Dream"       51 18.8 203 4100 "male"    0 2008
                311 "Chinstrap" "Dream"     49.7 18.6 195 3600 "male"    0 2008
                312 "Chinstrap" "Dream"     47.5 16.8 199 3900 "female"  1 2008
                313 "Chinstrap" "Dream"     47.6 18.3 195 3850 "female"  1 2008
                314 "Chinstrap" "Dream"       52 20.7 210 4800 "male"    0 2008
                315 "Chinstrap" "Dream"     46.9 16.6 192 2700 "female"  1 2008
                316 "Chinstrap" "Dream"     53.5 19.9 205 4500 "male"    0 2008
                317 "Chinstrap" "Dream"       49 19.5 210 3950 "male"    0 2008
                318 "Chinstrap" "Dream"     46.2 17.5 187 3650 "female"  1 2008
                319 "Chinstrap" "Dream"     50.9 19.1 196 3550 "male"    0 2008
                320 "Chinstrap" "Dream"     45.5   17 196 3500 "female"  1 2008
                321 "Chinstrap" "Dream"     50.9 17.9 196 3675 "female"  1 2009
                322 "Chinstrap" "Dream"     50.8 18.5 201 4450 "male"    0 2009
                323 "Chinstrap" "Dream"     50.1 17.9 190 3400 "female"  1 2009
                324 "Chinstrap" "Dream"       49 19.6 212 4300 "male"    0 2009
                325 "Chinstrap" "Dream"     51.5 18.7 187 3250 "male"    0 2009
                326 "Chinstrap" "Dream"     49.8 17.3 198 3675 "female"  1 2009
                327 "Chinstrap" "Dream"     48.1 16.4 199 3325 "female"  1 2009
                328 "Chinstrap" "Dream"     51.4   19 201 3950 "male"    0 2009
                329 "Chinstrap" "Dream"     45.7 17.3 193 3600 "female"  1 2009
                330 "Chinstrap" "Dream"     50.7 19.7 203 4050 "male"    0 2009
                331 "Chinstrap" "Dream"     42.5 17.3 187 3350 "female"  1 2009
                332 "Chinstrap" "Dream"     52.2 18.8 197 3450 "male"    0 2009
                333 "Chinstrap" "Dream"     45.2 16.6 191 3250 "female"  1 2009
                334 "Chinstrap" "Dream"     49.3 19.9 203 4050 "male"    0 2009
                335 "Chinstrap" "Dream"     50.2 18.8 202 3800 "male"    0 2009
                336 "Chinstrap" "Dream"     45.6 19.4 194 3525 "female"  1 2009
                337 "Chinstrap" "Dream"     51.9 19.5 206 3950 "male"    0 2009
                338 "Chinstrap" "Dream"     46.8 16.5 189 3650 "female"  1 2009
                339 "Chinstrap" "Dream"     45.7   17 195 3650 "female"  1 2009
                340 "Chinstrap" "Dream"     55.8 19.8 207 4000 "male"    0 2009
                341 "Chinstrap" "Dream"     43.5 18.1 202 3400 "female"  1 2009
                342 "Chinstrap" "Dream"     49.6 18.2 193 3775 "male"    0 2009
                343 "Chinstrap" "Dream"     50.8   19 210 4100 "male"    0 2009
                344 "Chinstrap" "Dream"     50.2 18.7 198 3775 "female"  1 2009
                end
                label values female female
                label def female 0 "male", modify
                label def female 1 "female", modify
                label def female .a "NA", modify
                label var id "Skip to content"
                label var bill_length "bill length (mm)"
                label var bill_depth "bill depth (mm)"
                label var flipper_length "flipper length (mm)"
                label var body_mass "body mass (g)"
                The major point here is that a good graph should depend on the data, your aims of analysis, and the people who are going to be reading the graph. So that broad maxim not only allows but also impels some flexibility according to taste and circumstance.

                Nevertheless it seems to me that a good graph for univariate distributions should show not just summaries but also details, as many details could be of interest or importance, and you shouldn't ignore details as trivial until you know what they are.

                To show some flexibility I repeated the code in #4 but this time using normal distributions as reference.

                Assume that the data just given are in palmer_penguins.dta. (Hence omit the first command if you have run the data example.)

                As before this needs pctilesets from SSC and qplot from the Stata Journal but the same result is within the reach of official code too.

                To spell it out, the longer horizontal lines show means; the box plots show median, quartiles and extremes; and all that should be explained in any fuller paper or presentation.

                Code:
                use palmer_penguins, clear
                
                pctilesets bill_length, over(island) pctile(25 50 75) min max saving(summary, replace)
                
                clonevar origgvar=island
                
                merge m:1 origgvar using summary
                
                gen where = 3.2
                
                egen mean = mean(bill_length), by(island)
                
                bysort island : gen x = cond(_n==1, -2.9, cond(_n == _N, 2.9, .))
                
                set scheme stcolor
                
                qplot bill_length, trscale(invnormal(@)) by(island, note("") row(1) legend(off)) xla(-3/3) addplot(rbar p25 p50 where, barw(0.4) lcol(black) fcolor(none) || rbar p50 p75 where, barw(0.4) lcolor(black) fcolor(none) || rspike p75 max where, lcol(black) || rspike p25 min where, lcol(black) || line mean x, lcol(stc2)) ytitle(Bill length (mm)) xtitle(Standard normal deviate) yla(35(5)60)
                Click image for larger version

Name:	qplot_box.png
Views:	1
Size:	52.5 KB
ID:	1773984



                I prefer this plot to any in #5, but I would say that, wouldn't I?

                Compare (in a very different context) https://en.wikipedia.org/wiki/Well_h...uldn%27t_he%3F

                Comment


                • #9
                  The Palmer penguins dataset in the previous post should be corrected so that Adélie corrects Adelie.

                  A suitable academic reference is https://journals.plos.org/plosone/ar...l.pone.0090081

                  Comment

                  Working...
                  X