Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for a bar graph feature to summarize distribution using a second set of bins or groups.

    Hi all

    I have a categorical variable that is centered around zero and is somewhat normal-looking (bell shaped). I want to summarize the distribution and a bar graph seems natural. However, it would also be nice to also show the % of the sample that falls within X bars/units away from the center. Similar to how with an actual normal graph, we know for example that 95% is within +-1.963. One can think of this as a second set of (wider) bins, or I suppose another set of groups.

    The code to produce an example is below. I used the auto data to create a new variable headroom0 centered around zero. I then use Microsoft paint to create the picture attached below. The feature I am looking for is the addition of the "bins"/brackets I have added in paint (at bottom), or another feature that achieves the same thing visually, on the same graph. Any ideas?

    The bottom line for my analysis is the overall statistic for each sub-sample (which would be the 87.84% from the example below) that is within an acceptable range. In this case I find that the a statistic alone can be a bit abstract, so it can help the narrative of a piece if one ties it to a visual.

    Code:
    sysuse auto,clear
    su headroom
    gen headroom0=round(headroom-`r(mean)',0.5)
    graph bar, over(headroom0) title(distribution of headroom0) ytitle(proportion of sample) b1title(value of headroom)
    Click image for larger version

Name:	example bar for statalist.jpg
Views:	1
Size:	97.3 KB
ID:	1682914


    Update: I can't for the life of me figure out how to get this picture to upload at a reasonable size.

    Regards,
    Bruce
    Last edited by Bruce McDougall; 22 Sep 2022, 03:38.

  • #2
    Code:
    frame reset
    sysuse auto,clear
    su headroom
    gen headroom0=round(headroom-`r(mean)',0.5)
    
    frame copy default tograph
    frame change tograph
    contract headroom0, percent(perc)
    
    sum perc if inrange(headroom0, -.5, .5)
    local r1 : display %9.1f r(sum)
    local r1 = strltrim("`r1'") + "%"
    
    sum perc if inrange(headroom0, -1, 1)
    local r2 : display %9.1f r(sum)
    local r2 = strltrim("`r2'") + "%"
    
    
    twoway bar perc headroom0, barw(.45)              || ///
           pci -1.5 -0.725 -1.5 0.725, lcolor(black)  || ///
           pci -3.0 -1.225 -3.0 1.225, lcolor(black)  || ///
           scatteri -1.5 0 (12) "`r1'"                   ///
                    -3.0 0 (12) "`r2'",                  ///
           msymbol(i) mlabcolor(black) legend(off)       ///
           ylab(0(5)20, format(%9.0f) angle(0))          ///
           ytitle("Percent") xtitle("Headroom")
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	55.1 KB
ID:	1682926
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hi Marteen

      Thanks for your response, I am using your solution!

      That said I am finding it not 100% perfect as you sometimes need to manually tweak the lines and their associated text.

      For example, if the scale of the y variable increases, the graph will automatically increase the scale of the y axis, and then you need to increase the negative spacing below y=zero for the text to not overlap with the bars.

      I am experimenting with things to try generalize it, for example by using a fixed % of the the y-axis but its proving quite tricky. See idea below (this code is not designed to run):

      Code:
      su headroom0 //or any other variable we might want to do the same thing with
      local max = r(max)
      local scale=max/10
      //and then below in the twoway
      pci -`scale' -0.725 -`scale' 0.725
      Manually doing it for each case is an option as I have about 8 different sub-samples to run the exercise on, but for future instances this might be quite painful!

      Thanks again,
      Bruce

      Comment

      Working...
      X