Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Box plot help

    Hello everyone,

    I am trying to make this visual from a book by Edward Tufte where he talks about using a stripped down version of the box plot as practice.

    I wrote a code but I am unable to figure out why the length of whiskers keep coming out incorrect. Can someone help me out here?

    Code:
        sysuse auto, clear
        
        * Name of variable to use for box plot:
        local variable price
        
        * Display boxplot by which group?
        local group foreign
        
        
    
    * Plot type 1    
    
    capture separate `variable', by(`group')
        
        levelsof `group', local(lvl)
        foreach level of local lvl {
            sort `variable'
            
            quietly summ `group'
            local max = `r(max)'
            local min = `r(min)'
            local scale = `r(max)' - `r(min)'
            local offset : display abs(`scale'*0.02)
        
            quietly summ `variable' if `group' == `level', detail
            local level = `level' + 1
            local xlab "`xlab' `level' `" "`:lab (`group') `=`level'-1''" "'"
    
            local med_p_`level' = `r(p50)'
            local p75_`level' = `r(p75)'
            local p25_`level' = `r(p25)'
            local iqr_`level' = `p75_`level'' - `p25_`level''
            display "Median = `med_p_`level''"
            display "P75 = `p75_`level''"
            display "P25 = `p25_`level''"
            display "IQR = `iqr_`level''"
            display "Low = `=`p25_`level''-(1.5*`iqr_`level'')'"
            display "Max = `=`p75_`level''+(1.5*`iqr_`level'')'""
            display "Varname = `variable'`=`level'-1'"
            
            egen llw_`level' = min(max(`variable'`=`level'-1', `=`p25_`level''-(1.5*`iqr_`level'')'))
            egen uuw_`level' = max(min(`variable'`=`level'-1', `=`p75_`level''+(1.5*`iqr_`level'')'))
            
            quietly summ uuw_`level'
            local max_`level' = `r(mean)'
            quietly summ llw_`level'
            local min_`level' = `r(mean)'        
            
            
            local     lines `lines' ///
                    (scatteri `p75_`level'' `level' `max_`level'' `level', recast(line) lpattern(solid) lcolor(black) lwidth(1)) || ///
                    (scatteri `p25_`level'' `level' `min_`level'' `level', recast(line) lpattern(solid) lcolor(black) lwidth(1)) || ///
                    (scatteri `p75_`level'' `=`level' + `offset'' `p25_`level'' `=`level' + `offset'', recast(line) lpattern(solid) lcolor(black) lwidth(1)) || ///
                    (scatteri `med_p_`level'' `=`level' + `offset'', ms(square) mcolor(background)) ||
    
    }
        
        *drop llw* uuw*
        
        twoway `lines', ///
                ytitle("`: variable label `variable''") ///
                ylabel(2000(2000)10000) xtitle("") ///
                xlabel(`xlab', nogrid) ///
                xscale(range(`=`min' + 0.5' `=`max' + 1.5')) ///
                scheme(white_tableau) ///
                title("{bf}Tufte Styled Box Plot", pos(11) margin(b+3) size(*.7)) ///
                subtitle("`: variable label `variable'' grouped by `: variable label `group''", pos(11) margin(b+6 t=-3) size(*.6)) ///
                legend(off)
        
        
        
        * Tufte style box plot version 2
        graph box mpg, box(1, color(white%0)) medtype(marker) medmarker(mcolor(black) mlwidth(0)) cwhiskers alsize(0) intensity(0) over(foreign) lintensity(1) lines(lpattern(solid) lwidth(medium) lcolor(black)) nooutside ylabel(, nogrid) scheme(white_tableau)
    The plot type 1 whsiters are different from the one in 2. Can someone kindly check what I am doing wrong in this code, will be grateful.


  • #2
    Perhaps because your first plot uses `variable' which has been defined as price, while your second plot uses mpg?

    Comment


    • #3
      https://www.statalist.org/forums/for...-without-boxes shows Tufte's original ideas being taken up differently.

      Comment


      • #4
        William Lisowski apologies thats a typo, thank you for pointing out. Even with price the whisker is not coming out right

        Comment


        • #5
          Nick Cox this was an interesting post and thank you for sharing insights. I liked the dot plot take, definitely adding more to the plot. As for this plot, I am doing this more for my learning. Trying to understand the tips and tricks of visualizing.

          Comment


          • #6
            What is confusing me here is that when I used the method posted on UCLA website i get the following graph with the following code:

            Code:
            sysuse auto, clear
            
            sort foreign price
            
            * Use egen to generate the median, quartiles, interquartile range (IQR), and mean. 
            by foreign: egen med = median(price)
            by foreign: egen lqt = pctile(price), p(25)
            by foreign: egen uqt = pctile(price), p(75)
            by foreign: egen iqr = iqr(price)
            by foreign: egen mean = mean(price)
            
            by foreign: gen uqt_up = uqt+1.5*iqr
            by foreign: gen uqt_down = lqt-1.5*iqr
            
            
            * Find the highest value (egen max) that is less than uqt + 1.5 iqr (the min() function)
            * this is used to form the upper "whisker" of the boxplot.
            by foreign: egen us = max(min(price, uqt_up))
            
            * Find the lowest value (egen min) that is more than lqt - 1.5 iqr (the max() function)
            * this is used to form the lower "whisker" of the boxplot.
            by foreign: egen ls = min(max(price, uqt_down))
            
            
            twoway rbar lqt med foreign, pstyle(p1) barw(.5) || ///
                   rbar med uqt foreign, pstyle(p1) barw(.5) || ///
                   rspike lqt ls foreign, pstyle(p1) || ///
                   rspike uqt us foreign, pstyle(p1) || ///
                   rcap ls ls foreign, msize(*6) pstyle(p1) || ///
                   rcap us us foreign, msize(*6) pstyle(p1) || ///
                   , xlabel( 0 "Domestic" 1 "Foreign" ) ytitle(reading score) legend(off) ///
                   ylabel(2000(2000)12000, grid) scheme(s1color)
            Click image for larger version

Name:	Screenshot 2022-06-25 152957.png
Views:	1
Size:	41.1 KB
ID:	1670884

            However, if I am to use the default box plot command from Stata, I get the following visual using the following code:

            Code:
            graph box price, over(foreign) nooutside
            Click image for larger version

Name:	Screenshot 2022-06-25 153200.png
Views:	1
Size:	36.2 KB
ID:	1670885


            How to go about replicating this exact visual from Stata box plot?

            Comment


            • #7
              I guess if I were to replace values greater than upper quartile mark with missing (and do the same for the opposite case) then the graph can be replicated.

              Code:
              sysuse auto, clear
              
              sort foreign price
              
              * Use egen to generate the median, quartiles, interquartile range (IQR), and mean. 
              by foreign: egen med = median(price)
              by foreign: egen lqt = pctile(price), p(25)
              by foreign: egen uqt = pctile(price), p(75)
              by foreign: egen iqr = iqr(price)
              by foreign: egen mean = mean(price)
              
              by foreign: gen uqt_up = uqt+1.5*iqr
              by foreign: gen uqt_down = lqt-1.5*iqr
              
              generate uqt_upp = price if price <= uqt_up
              generate uqt_downn = price if price >= uqt_down
              
              * Find the highest value (egen max) that is less than uqt + 1.5 iqr (the min() function)
              * this is used to form the upper "whisker" of the boxplot.
              by foreign: egen us = max(uqt_upp)
              
              * Find the lowest value (egen min) that is more than lqt - 1.5 iqr (the max() function)
              * this is used to form the lower "whisker" of the boxplot.
              by foreign: egen ls = min(uqt_downn)
              
              
              twoway rbar lqt med foreign, pstyle(p1) barw(.5) || ///
                     rbar med uqt foreign, pstyle(p1) barw(.5) || ///
                     rspike lqt ls foreign, pstyle(p1) || ///
                     rspike uqt us foreign, pstyle(p1) || ///
                     rcap ls ls foreign, msize(*6) pstyle(p1) || ///
                     rcap us us foreign, msize(*6) pstyle(p1) || ///
                     , xlabel( 0 "Domestic" 1 "Foreign" ) ytitle(reading score) legend(off) ///
                     ylabel(2000(2000)12000, grid) scheme(s1color)
              Click image for larger version

Name:	Screenshot 2022-06-25 154920.png
Views:	1
Size:	40.8 KB
ID:	1670888


              Comment

              Working...
              X