Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Histogram with large proportion of 0's

    Hello! Thanks in advance for any help!

    I have a problem constructing a histogram. I have a distribution with a very large proportion of 0's (75% - 80%).
    I run a normal histogram and get:
    Code:
    hist tstind, percent
    GRAPH1.png

    I want to change the histogram so that the y-axis is between 0 and 10 without changing the percentages. If I limit the value by using "if tstind>0" then the percentage values will be off.
    Is there a way to do this in Stata where I limit the y-axis but don't change the proportions.

    Thanks in advance for trying to help!

    Leo

  • #2
    Not directly. You would need to generate the frequency distribution with twoway__histogram_gen and then take responsibility for your own graph using twoway bar.

    See also
    http://www.stata-journal.com/sjpdf.html?articlenum=gr0014

    Comment


    • #3
      Code:
      sysuse auto,clear
      twoway__histogram_gen price, bin(8) start(3291)  gen(x h)
      twoway bar x h if h < 10000, barw(1500)
      See also http://www.stata-journal.com/sjpdf.h...iclenum=gr0014

      Comment


      • #4
        Thanks so much for the responses!

        Comment


        • #5
          Another way to "tuck in" spikes is to use a square root scale for the frequencies or proportions, see for example paragraph 6 of http://www.edwardtufte.com/tufte/tukey.

          Code:
          // open some example data
          sysuse nlsw88, clear
          
          // create the variables for the histogram
          twoway__histogram_gen hours, start(0) width(2) gen(h x) frac
          
          // transform the fractions
          replace h = sqrt(h)
          
          // label the y-axis
          // see: http://www.stata-journal.com/sjpdf.html?articlenum=gr0032
          forvalues i = 0/5 {
              local lab `lab' `=sqrt(`i'/10)' "`=`i'/10'"
          }
          
          // create the graph
          twoway bar h x, barw(2) ylab(`lab') ytitle("fraction (root scale)")
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            I'll steal Maarten's example and do it another way. We're exploiting two facts:

            1. spikeplot has inbuilt support for a square root scale.

            2. By default it shows spikes and at first sight is most suited for discrete variables, but we can easily subvert that with recast().

            Code:
            // open some example data
            sysuse nlsw88, clear
            
            clonevar hours2 = hours
            // 2 hour bin width, start at 0 
            replace hours2 = 2 * floor(hours/2) + 1
            
            // create the graph
            spikeplot hours2, recast(bar) barw(2) root
            spikeplot hours2, recast(bar) barw(2) root yla(0 10 "100" 20 "400" 30 "900") ytitle(Frequency (square root scale))
            Click image for larger version

Name:	rootogram.png
Views:	1
Size:	12.6 KB
ID:	1301854

            Comment


            • #7
              Thanks so much Nick, Maarten, and Scott! This has been a big help.

              Using a combination of all three methods but specifically Nick's method in the end my new code and graph looks like this:


              Code:
              spikeplot tstind2, recast(bar) barw(2) root yla(0 10 "100" 20 "400" 30 "900" 60 "3600" 44.7 "2000" 70.71 "5000") ytitle("Frequency") xtitle("Tuberculin skin test induration size, millimeters")

              Click image for larger version

Name:	Frequency Distribution -- TST Induration.png
Views:	1
Size:	11.5 KB
ID:	1302083


              Pretty neat and so much better looking than before. Can I just call the y-axis "Frequency" instead of "Frequency (square root scale)" since I am squaring the y-axis scale numbers?

              Thanks again! Big help.

              Leo


              Comment


              • #8
                I would add the (square root scale) as without it people start wondering why numbers that are the same distance appart don't show the same increment.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  hi,

                  thank you for this very informative thread.

                  in my case, the variable I want to plot is the % of immigrants in localities across a country. not only do I have quite of few localities with 0% immigrants, but I also have quite a few with very small percentages - values such as 0.001%, 0.1%, and 0.05%.

                  would you still recommend a square root approach in this case?


                  many thanks,
                  A.




                  Comment


                  • #10
                    Square roots of frequencies are likely to be a competitor in #9. It is also possible that a transformation of the outcome would help too.

                    Showing us the results of

                    Code:
                    quantile yourvar 
                    
                    summarize yourvar, detail
                    would help, where you should naturally use your variable name not yourvar.

                    Comment


                    • #11
                      While axis breaks are not to be universally recommended I feel they sometimes offer a reasonable tradeoff between visualization of the "extreme" information and compression of the non-extreme information, keeping information (in this case the probabilities) in its natural rather than transformed scale. (I also know that some people who have contributed to this thread do not enthusiastically endorse this idea.)

                      Click image for larger version

Name:	mepser1.png
Views:	1
Size:	8.9 KB
ID:	1718656





                      Click image for larger version

Name:	mepser2.png
Views:	1
Size:	10.0 KB
ID:	1718657

                      Last edited by John Mullahy; 28 Jun 2023, 06:19.

                      Comment

                      Working...
                      X