Looking for an example dataset to introduce Stata to some students, I've found the highschool.dta (or multistage.dta) dataset in the examples available online at Stata Press ( http://www.stata-press.com/data/r14/svy.html)
Observations are supposed to be individual, with information on their gender, weight and height (among other)
You see that the individual's weight is quite normal, however, the height is clearly impossible (for those, like me, who more familiar with metric system, the mean height is around 11 meters tall).
I've first though that the data have been transformed in purpose, not to reveal confidential data (but why put such values?).
I've then thought it was the wrong unit, but I found no other possible height units with these values. Also, it is not a misplaced decimal sign (43 inches would be too small), nor a squared value, etc...
However, I found some material (here, p.13) where using the same dataset, they get a mean of height around 67 inches, and not 430.
I've been checking r14 r13 r12 r11 r10 and r9 datasets, and always got the same abnormal values of height.
Actually I get the same values that in the Survey data reference manual (http://www.stata.com/manuals14/svy.pdf), p.13, where only the variable weight is described, but height is used as a regressor of weight, and one can see a abnormal result :
Where the height is supposed to be in inches and the weights in lbs. The regression predicts a negative weight for all observations with a height below 210 inches (5.3 meters).
I was just wondering whether this was done on purpose, or not. If it is the case how come the first reference I found had "normal" values?
If it is not the case, perhaps someone could update the example files.
Thanks,
Charlie
Code:
use http://www.stata-press.com/data/r14/multistage ,clear * Or more simply : webuse multistage.dta svyset county [pw=sampwgt], strata(state) fpc(ncounties) || school, fpc(nschools) su weight height,de
You see that the individual's weight is quite normal, however, the height is clearly impossible (for those, like me, who more familiar with metric system, the mean height is around 11 meters tall).
I've first though that the data have been transformed in purpose, not to reveal confidential data (but why put such values?).
I've then thought it was the wrong unit, but I found no other possible height units with these values. Also, it is not a misplaced decimal sign (43 inches would be too small), nor a squared value, etc...
However, I found some material (here, p.13) where using the same dataset, they get a mean of height around 67 inches, and not 430.
I've been checking r14 r13 r12 r11 r10 and r9 datasets, and always got the same abnormal values of height.
Actually I get the same values that in the Survey data reference manual (http://www.stata.com/manuals14/svy.pdf), p.13, where only the variable weight is described, but height is used as a regressor of weight, and one can see a abnormal result :
Code:
. svy: regress weight height (running regress on estimation sample) Survey: Linear regression Number of strata = 50 Number of obs = 4071 Number of PSUs = 100 Population size = 8000000 Design df = 50 F( 1, 50) = 593.99 Prob > F = 0.0000 R-squared = 0.2787 Linearized weight Coef. Std. Err. t P>t [95% Conf. Interval] height .7163115 .0293908 24.37 0.000 .6572784 .7753447 _cons -149.6183 12.57265 -11.90 0.000 -174.8712 -124.3654
I was just wondering whether this was done on purpose, or not. If it is the case how come the first reference I found had "normal" values?
If it is not the case, perhaps someone could update the example files.
Thanks,
Charlie
Comment