Hi everyone,
Longtime fan of Statalist but this is my first post (having read the advice first of course!) - good to be here. I am trying to calculate the height of the highest bin of a histogram. The objective is to simulate a distribution of test scores. The reason I want to know the height of the highest bin is in order to then draw vertical lines of that height along the x-axis at particular thresholds, mirroring the reporting structure of the test I am simulating. Of course I could quickly get there by trial-and-error, but ultimately intend to iterate the simulation many times with different test parameters.
I saw that twoway__histogram_gen might be a reasonable approach to try (https://www.statalist.org/forums/for...rom-histograms). The method does seem to provide a somewhat plausible answer, but it does not match the output of the "original" histogram. It is clear that the resultant histograms are not the same, and as such the height of the highest bin is different. This is confusing to me because I think I have set all the same parameters (bin start and width) for both graphs, and the underlying data are the same. I also tried essentially recreating a bivariate table using contract, which gets me to the same result as twoway__histogram_gen, but is still different from the "original" histogram. I must be missing something basic about how these functions work or how I am setting them up, so any help would be appreciated - thank you!
code:
clear
set scheme s1mono
set obs 2500 //set number of test scores to generate
set seed 12345 //set RNG seed
scalar mu=492 //set mean
scalar sd=75 //set standard deviation
gen test_score = round(rnormal(mu,sd)) //generate normal distribution of integer scores with above parameters
//desired output histogram: horizontal lines included to show different results of histogram and twoway__histogram_gen
graph twoway histogram test_score, start(12.5) bin(25) freq yline(328) yline(396)
//calculate height of highest bin to eventually draw vertical lines of matching maximum height
//first try using twoway__histogram_gen
twoway__histogram_gen test_score, start(12.5) width(25) freq gen(h x, replace)
twoway bar h x, barwidth(25) yline(328) yline(396)
//also try by contracting the data
gen bin_num=floor(test_score/25+12.5)
contract bin_num test_score
bysort bin_num : gen bin_num_height = sum(_freq)
egen max_bin_height = max(bin_num_height)
tab max_bin_height //328, same as output of twoway__histogram_gen
output of "original" twoway histogram (height of highest bin = 396):
output of twoway bar based on twoway__histogram_gen (height of highest bin = 328):

Longtime fan of Statalist but this is my first post (having read the advice first of course!) - good to be here. I am trying to calculate the height of the highest bin of a histogram. The objective is to simulate a distribution of test scores. The reason I want to know the height of the highest bin is in order to then draw vertical lines of that height along the x-axis at particular thresholds, mirroring the reporting structure of the test I am simulating. Of course I could quickly get there by trial-and-error, but ultimately intend to iterate the simulation many times with different test parameters.
I saw that twoway__histogram_gen might be a reasonable approach to try (https://www.statalist.org/forums/for...rom-histograms). The method does seem to provide a somewhat plausible answer, but it does not match the output of the "original" histogram. It is clear that the resultant histograms are not the same, and as such the height of the highest bin is different. This is confusing to me because I think I have set all the same parameters (bin start and width) for both graphs, and the underlying data are the same. I also tried essentially recreating a bivariate table using contract, which gets me to the same result as twoway__histogram_gen, but is still different from the "original" histogram. I must be missing something basic about how these functions work or how I am setting them up, so any help would be appreciated - thank you!
code:
clear
set scheme s1mono
set obs 2500 //set number of test scores to generate
set seed 12345 //set RNG seed
scalar mu=492 //set mean
scalar sd=75 //set standard deviation
gen test_score = round(rnormal(mu,sd)) //generate normal distribution of integer scores with above parameters
//desired output histogram: horizontal lines included to show different results of histogram and twoway__histogram_gen
graph twoway histogram test_score, start(12.5) bin(25) freq yline(328) yline(396)
//calculate height of highest bin to eventually draw vertical lines of matching maximum height
//first try using twoway__histogram_gen
twoway__histogram_gen test_score, start(12.5) width(25) freq gen(h x, replace)
twoway bar h x, barwidth(25) yline(328) yline(396)
//also try by contracting the data
gen bin_num=floor(test_score/25+12.5)
contract bin_num test_score
bysort bin_num : gen bin_num_height = sum(_freq)
egen max_bin_height = max(bin_num_height)
tab max_bin_height //328, same as output of twoway__histogram_gen
output of "original" twoway histogram (height of highest bin = 396):
output of twoway bar based on twoway__histogram_gen (height of highest bin = 328):
Comment