Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using twoway__histogram_gen to calculate height of highest histogram bin

    Hi everyone,

    Longtime fan of Statalist but this is my first post (having read the advice first of course!) - good to be here. I am trying to calculate the height of the highest bin of a histogram. The objective is to simulate a distribution of test scores. The reason I want to know the height of the highest bin is in order to then draw vertical lines of that height along the x-axis at particular thresholds, mirroring the reporting structure of the test I am simulating. Of course I could quickly get there by trial-and-error, but ultimately intend to iterate the simulation many times with different test parameters.

    I saw that twoway__histogram_gen might be a reasonable approach to try (https://www.statalist.org/forums/for...rom-histograms). The method does seem to provide a somewhat plausible answer, but it does not match the output of the "original" histogram. It is clear that the resultant histograms are not the same, and as such the height of the highest bin is different. This is confusing to me because I think I have set all the same parameters (bin start and width) for both graphs, and the underlying data are the same. I also tried essentially recreating a bivariate table using contract, which gets me to the same result as twoway__histogram_gen, but is still different from the "original" histogram. I must be missing something basic about how these functions work or how I am setting them up, so any help would be appreciated - thank you!

    code:
    clear
    set scheme s1mono
    set obs 2500 //set number of test scores to generate
    set seed 12345 //set RNG seed
    scalar mu=492 //set mean
    scalar sd=75 //set standard deviation
    gen test_score = round(rnormal(mu,sd)) //generate normal distribution of integer scores with above parameters

    //desired output histogram: horizontal lines included to show different results of histogram and twoway__histogram_gen
    graph twoway histogram test_score, start(12.5) bin(25) freq yline(328) yline(396)

    //calculate height of highest bin to eventually draw vertical lines of matching maximum height
    //first try using twoway__histogram_gen
    twoway__histogram_gen test_score, start(12.5) width(25) freq gen(h x, replace)
    twoway bar h x, barwidth(25) yline(328) yline(396)

    //also try by contracting the data
    gen bin_num=floor(test_score/25+12.5)
    contract bin_num test_score
    bysort bin_num : gen bin_num_height = sum(_freq)
    egen max_bin_height = max(bin_num_height)
    tab max_bin_height //328, same as output of twoway__histogram_gen

    output of "original" twoway histogram (height of highest bin = 396):
    Click image for larger version

Name:	Screen Shot 2025-04-24 at 7.04.49 PM.png
Views:	1
Size:	853.7 KB
ID:	1776501




    output of twoway bar based on twoway__histogram_gen (height of highest bin = 328):
    Click image for larger version

Name:	Screen Shot 2025-04-24 at 7.05.00 PM.png
Views:	1
Size:	824.4 KB
ID:	1776502


  • #2
    Hi Giorgio Caturegli, a quick check: you used width(25) other than bin(25) in your twoway__histogram_gen command, and that result in the difference. Try and test:
    Code:
    clear
    set scheme s1mono
    set obs 2500 //set number of test scores to generate
    set seed 12345 //set RNG seed
    scalar mu=492 //set mean
    scalar sd=75 //set standard deviation
    gen test_score = round(rnormal(mu,sd)) //generate normal distribution of integer scores with above parameters
    
    //desired output histogram: horizontal lines included to show different results of histogram and twoway__histogram_gen
    graph twoway histogram test_score, start(12.5) bin(25) freq yline(328) yline(396) name(g1, replace)
    
    //calculate height of highest bin to eventually draw vertical lines of matching maximum height
    //first try using twoway__histogram_gen
    twoway__histogram_gen test_score, start(12.5) bin(25) freq gen(h x, replace) name(g2, replace)
    twoway bar h x, barwidth(25) yline(328) yline(396)

    Comment


    • #3
      Sorry in reviewing what I posted I realized it was just a typo! I was supplying the "original" histogram with the number of bins rather than the width of each bin. Having fixed this, the two match up nicely. twoway__histogram_gen is a nice way to do this, in case anybody else has this bizarre desire. Thanks!

      graph twoway histogram test_score, start(12.5) width(25) freq yline(328) yline(396)

      twoway__histogram_gen test_score, start(12.5) width(25) freq gen(h x, replace)
      twoway bar h x, barwidth(25) yline(328) yline(396)

      Comment


      • #4
        Below is what I get, using graph combine:
        Click image for larger version

Name:	Graph.png
Views:	1
Size:	144.3 KB
ID:	1776507

        Comment

        Working...
        X