The following is a little exercise I use in an intro biostats class--it is borrowed from the website of Burt Gerstman, author of Basic Biostatistics.
When I first did the problem, I was using Stata 13, and indeed, the box-plot showed the two highest scores (152, 155) as potential outliers. But an eagle-eyed student in this year's class pointed out that with Stata 15, she was seeing only one outlier (155). I am currently using 15.1, but still have 13.1 installed. So I tried it with both. Version 13.1 shows 2 outliers, version 15.1 shows one. Furthermore, when I use version control in 15.1 (e.g., version 13: graph box BW_pct_ideal), I see only one outlier.
I have looked at help whatsnew, and searched for <boxplot> and <outlier>, but thus far have not found anything that indicates a change in the rules for identifying outliers. Does anyone here have any thoughts on what might be causing this discrepancy?
Thanks,
Bruce
Code:
* 3.11 The median is more robust than the mean. Body weights (n = 10) * expressed as "percentage of ideal" for 10 individuals are * {99, 101, 107, 114, 116, 119, 121, 125, 152, 155}. clear input BW_pct_ideal 99 101 107 114 116 119 121 125 152 155 end * Calculate the mean & median. tabstat BW_pct_ideal, stat(n mean median) * Make a boxplot of the data and identify the two outliers in the dataset. graph box BW_pct_ideal * With the two outliers excluded, recalculate the mean and median. What * effect did removing the outliers have on the mean and median? tabstat BW_pct_ideal if BW_pct_ideal < 152, stat(n mean median) * When we used all of the data, mean > median (120.9 > 117.5). * But when we excluded the two outliers, mean < median (112.75 < 115).
I have looked at help whatsnew, and searched for <boxplot> and <outlier>, but thus far have not found anything that indicates a change in the rules for identifying outliers. Does anyone here have any thoughts on what might be causing this discrepancy?
Thanks,
Bruce
Comment