Hi all,
There's some non-trivial chance that this is just me being extremely stupid, but I'm seeing some weird binning behavior on some histograms I'm making, where two bins seem to be getting joined and thus the count in the resulting bin is the sum of that bin and the adjacent bin, while the adjacent bin looks like it is empty. It is happening when I try to plot the frequency distribution of some test scores and try to plot two groups (one above a relevant cutoff, another below it) using twoway to highlight where the groups change using the color of the bars. I've been able to get this problem to show up using some toy data, so perhaps it's just easiest to show what I'm seeing and see if anyone knows why it might be happening and/or how to fix it:
Which results in this:

If you look at the tabulation in the relevant range it's clear that the bar right near the cutoff should not be more than 500 observations and instead it must be adding that bin and the next one up:
1040 | 453 1.13 91.17
1050 | 436 1.09 92.26
1060 | 370 0.93 93.18
1070 | 325 0.81 94.00
1080 | 319 0.80 94.79
1090 | 287 0.72 95.51
1100 | 253 0.63 96.14
1110 | 211 0.53 96.67
1120 | 188 0.47 97.14
1130 | 167 0.42 97.56
1140 | 139 0.35 97.91
1150 | 127 0.32 98.22
1160 | 122 0.30 98.53
1170 | 95 0.24 98.77
1180 | 60 0.15 98.92
1190 | 80 0.20 99.12
Any suggestions welcome!
There's some non-trivial chance that this is just me being extremely stupid, but I'm seeing some weird binning behavior on some histograms I'm making, where two bins seem to be getting joined and thus the count in the resulting bin is the sum of that bin and the adjacent bin, while the adjacent bin looks like it is empty. It is happening when I try to plot the frequency distribution of some test scores and try to plot two groups (one above a relevant cutoff, another below it) using twoway to highlight where the groups change using the color of the bars. I've been able to get this problem to show up using some toy data, so perhaps it's just easiest to show what I'm seeing and see if anyone knows why it might be happening and/or how to fix it:
Code:
clear all
set more off
set seed 48103
set obs 40000
gen fake_scores = round(rnormal(850, 145), 10)
gen group = fake_scores>1100 & !missing(fake_scores)
twoway (hist fake_scores if group==0, width(10) fcolor(midblue) lcolor(gs5) freq) ///
(hist fake_scores if group==1, width(10) fcolor(yellow) lcolor(gs5) freq), scheme(plottig)
If you look at the tabulation in the relevant range it's clear that the bar right near the cutoff should not be more than 500 observations and instead it must be adding that bin and the next one up:
1040 | 453 1.13 91.17
1050 | 436 1.09 92.26
1060 | 370 0.93 93.18
1070 | 325 0.81 94.00
1080 | 319 0.80 94.79
1090 | 287 0.72 95.51
1100 | 253 0.63 96.14
1110 | 211 0.53 96.67
1120 | 188 0.47 97.14
1130 | 167 0.42 97.56
1140 | 139 0.35 97.91
1150 | 127 0.32 98.22
1160 | 122 0.30 98.53
1170 | 95 0.24 98.77
1180 | 60 0.15 98.92
1190 | 80 0.20 99.12
Any suggestions welcome!

Comment