I came up with a simple example, I discovered that Stata does not do what the manual of -summarize- says, and I would kindly ask an expert on weights -- in particular on the difference between frequency and analytic weights -- to give an opinion on what should happen here.
The simple example dataset is
The first problem is that the explanation in Methods and Formulas of -summarize- manual provides formulas for only one type of weights (-summarize- accepts three types of weights, analytic, frequency and importance). Given that the manual provided formulas for weights (without any differentiation between the three types) I expect they all to give the same result. But they do not:
The mean is
so this is clear why. The maximum is 2 because the 0 weight on 3 removed the latter, the observations and sum of weights seems all clear...
However why are the standard deviations and variances different, when the manual displays only one formula for all weights?
And why is importance weights coinciding with the frequency weights, and not the analytic weights?
The simple example dataset is
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(weight myvar) 6 1 4 2 0 3 end
Code:
. summ myvar [aw=weight] Variable | Obs Weight Mean Std. Dev. Min Max -------------+----------------------------------------------------------------- myvar | 2 10 1.4 .6928203 1 2 . return list scalars: r(N) = 2 r(sum_w) = 10 r(mean) = 1.4 r(Var) = .48 r(sd) = .6928203230275509 r(min) = 1 r(max) = 2 r(sum) = 14 . summ myvar [iw=weight] Variable | Obs Weight Mean Std. Dev. Min Max -------------+----------------------------------------------------------------- myvar | 2 10 1.4 .5163978 1 2 . return list scalars: r(N) = 2 r(sum_w) = 10 r(mean) = 1.4 r(Var) = .2666666666666667 r(sd) = .5163977794943222 r(min) = 1 r(max) = 2 r(sum) = 14 . summ myvar [fw=weight] Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- myvar | 10 1.4 .5163978 1 2 . return list scalars: r(N) = 10 r(sum_w) = 10 r(mean) = 1.4 r(Var) = .2666666666666667 r(sd) = .5163977794943222 r(min) = 1 r(max) = 2 r(sum) = 14 .
Code:
. dis .6*1 + .4*2 1.4
However why are the standard deviations and variances different, when the manual displays only one formula for all weights?
And why is importance weights coinciding with the frequency weights, and not the analytic weights?
Comment