Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Format Bar labels in Histogram

    Dear Statalist Community,

    currently I'm trying to set up a histogram using Stata's histogram command. Thereby I'm stuck on a probably very basic problem, for which however I couldn't find proper solution in old forum topics or via google. In the histogram I show the number of Merger&Acquisitions Deals announced in the period from 1993-1998 on a yearly basis.

    Attached you can find my produced graph with the corresponding bar values arising from the command:

    Code:
    histogram year_MandA, discrete frequency addlabel xlabel(1992(1)1999) ylabel(5000 "5000" 10000 "10000" 15000 "15000")
    As you can see easily, the bars labeled for years 1993,1994,1995 contain values smaller than 10000 and consequently are labeled "correctly", while for years 1996 - 1998 we have values larger than 10000 and they are labeled in a format with "1.1e+04, 1.2+e04 and 1.4+e04".
    Is there a possibility to change manually the labeled bars for years 1996 - 1998 like we can do in a standard bar plot?

    I already tried to use an alternative solution via the graph bar command, where I use a variable DealNumber as an unique identifier for the corresponding Merger and Acquisitions:

    Code:
    graph bar (count) DealNumber, over(year_MandA) blabel(bar)
    However running this code gives me an error message of type mismatch.

    Any help, solutions and remarks are highly appreciated

    Thank you in advance for you help

    Best regards
    Philipp
    Attached Files

  • #2
    A data example would have helped. Here are some more or less equivalent data and a work-around.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(freq year)
     6893 1993
     8195 1994
     9730 1995
    11230 1996
    12450 1997
    14670 1998
    end
    
    twoway bar freq year, base(0)  bfcolor(ltblue) || ///
    scatter freq  year, ms(none) mlabel(freq) mlabpos(12) legend(off) yla(0(5000)15000) xla(, noticks) xtitle("") scheme(s1color)
    You can get a count variable with say

    Code:
    egen freq = count(Year_MandA), by(Year_MandA) 

    Comment


    • #3
      Thanks a lot for your response Nick. Yes a data example would have helped for sure and I will consider that for the next time definitively. I just though there might be a simple option for changing the bar labels for the histogram command I was just not aware of.

      Nevertheless, your approach satisfies totally what I was after. Thanks you very much for providing it. I was just not aware of generating a count variable, as this solves the whole problem. With the count variable I can also just use the standard graph bar command and it gives me graph visualizing exactly what I want.

      Code:
      egen NumberMandA = count(year_MandA), by(year_MandA)
      
      graph bar (first) NumberMandA, over(year_MandA) blabel(bar, format(%9.1g)) ytitle(Number of Megers & Acuqisitions per Year) ylabel(, labsize(small)) title(Number of Megers & Acquisitions from 1993 - 1998)
      Attached Files

      Comment


      • #4
        Good point about graph bar.

        Whenever a graph is for contiguous time intervals I tend to join those who want bars to touch.

        I didn't address the point in my own code, but it's arguable that y axis labels are just clutter if all the numbers are shown at the top of the bars.

        P.S. megers is a trivial typo in your graph and axis titles.

        Comment


        • #5
          Thanks Nick for the advice and your help. The typo is corrected now, too. ;-)

          Comment


          • #6
            Hi folks, what're the chances I could get an assist on doing something similar with a histogram instead of a bar chart. Would like to reduce the labels to just one decimal point. I've tried variations on the code the suggestions you find here (https://www.statalist.org/forums/for...equency-option) (but unsuccessfully).

            Here is the code I've used:

            Code:
            hist items ///
            disc percent ///
            title("Distribution of Items") ///
            yla(, format(%12,0gc) ang(h)) ///
            xsize(11) ysize(7) ///
            addlabel addlabopts(mlabposition(12)) ///
            name(ItemsHistorgram, replace)
            Here is cropped output:


            Click image for larger version

Name:	Screen Shot 2018-12-12 at 10.05.33 AM.png
Views:	1
Size:	52.7 KB
ID:	1474562


            Comment


            • #7
              That's not legal code (no comma before options, but we aren't easily fooled) and you don't give a data example. But here is one work-around:


              Code:
              sysuse auto, clear
              
              * what to show and where to put 
              bysort rep78 : gen Percent = _N
              count if rep78 < .
              replace Percent = 100 * Percent/r(N)
              gen show = string(Percent, "%2.1f") 
              
              twoway bar Percent rep78, bstyle(histogram) barw(0.8) ysc(r(0 45)) yla(0(5)45, ang(h))  ///
              || scatter Percent rep78 , ms(none) mla(show) mlabpos(12) mlabc(black) legend(off)
              Click image for larger version

Name:	histobar.png
Views:	1
Size:	18.6 KB
ID:	1474566

              Comment


              • #8
                Consider also with tabplot (Stata Journal) some variation on

                Code:
                tabplot  rep78, percent showval

                Comment


                • #9
                  Thanks for the responses. Sorry about the missing comma in line one. Was not my intent to fool anyone. Thanks for catching that.

                  I really do appreciate the two solutions. Both solutions run slow in my use case (2.2 million observations).

                  I'm wondering if the following code (which uses auto.dta) (and is not missing the comma) could be modified to apply a format to those bar labels, no?

                  Code:
                  sysuse auto, clear
                  
                  hist rep78, ///
                  disc percent ///
                  title("Distribution of Items") ///
                  yla(, format(%12,0gc) ang(h)) ///
                  xsize(11) ysize(7) ///
                  addlabel addlabopts(mlabposition(12)) ///
                  name(ItemsHistorgram, replace)
                  Again, apologies for my missing comma.

                  Comment


                  • #10
                    My comment on not being fooled was just itself fooling around. A bigger deal is that I didn't skim through earlier posts but do now notice that #2 already gave the main idea, finding the frequencies and then showing them directly with twoway bar and scatter.

                    The format suboption in hist applies only to the axis labels. There doesn't appear to be a hook to change the format of the added labels on top of the bars. I looked, and would have suggested it had I found one.

                    2.2 M observations? Didn't know that earlier, but, indeed, you are asking graph to draw the same bars again and again and again. Trivial with a small dataset but otherwise you might get quicker results with something like

                    Code:
                    sysuse auto, clear
                    
                    * what to show and where to put
                    bysort rep78 : gen Percent = _N
                    count if rep78 < .
                    replace Percent = 100 * Percent/r(N)
                    gen show = string(Percent, "%2.1f")
                    egen tag = tag(rep78)  
                    
                    twoway bar Percent rep78 if tag, bstyle(histogram) barw(0.8) ysc(r(0 45)) yla(0(5)45, ang(h))  ///
                    || scatter Percent rep78 if tag, ms(none) mla(show) mlabpos(12) mlabc(black) legend(off)
                    or even

                    Code:
                    preserve
                    keep if tag
                    keep Percent rep78 show
                    twoway bar Percent rep78, bstyle(histogram) barw(0.8) ysc(r(0 45)) yla(0(5)45, ang(h))  ///
                    || scatter Percent rep78, ms(none) mla(show) mlabpos(12) mlabc(black) legend(off)
                    restore

                    Last edited by Nick Cox; 12 Dec 2018, 11:16.

                    Comment


                    • #11
                      I've looked back on this thread more than once. And more than once I get stuck with visualizations that put periods where commas should be found. The reason for this is how the format option is specified. The earlier post inadvertently specifies to use periods (European style) instead of commas (e.g. 5.000.000,00). This updated code specifies commas (5,000,000.00).

                      Code:
                      sysuse auto, clear
                      
                      hist rep78, ///
                      disc percent ///
                      title("Distribution of Items") ///
                      yla(, format(%12.0gc) ang(h)) ///
                      xsize(11) ysize(7) ///
                      addlabel addlabopts(mlabposition(12)) ///
                      name(ItemsHistorgram, replace)
                      Posting an update to help others avoid maddeningly worry why the display format might not be as anticipated.



                      Originally posted by Adam Ross Nelson View Post
                      Thanks for the responses....
                      Code:
                      sysuse auto, clear
                      
                      hist rep78, ///
                      disc percent ///
                      title("Distribution of Items") ///
                      yla(, format(%12,0gc) ang(h)) ///
                      xsize(11) ysize(7) ///
                      addlabel addlabopts(mlabposition(12)) ///
                      name(ItemsHistorgram, replace)
                      ...

                      Comment

                      Working...
                      X