Weights in pctile

Mauricio Caceres

Join Date: Sep 2015
Posts: 130

20 Jul 2018, 11:53

I was looking to understand how Stata computes weighted percentiles. I tried to follow the formula in the documentation (here: https://www.stata.com/manuals13/dpctile.pdf) but I get different results:

Code:

. set seed 1729

. clear

. set obs 100000
number of observations (_N) was 0, now 100,000

. gen w = 100 * runiform()

. gen long x = _n

. sort x w

. gen wsum = sum(w)

. gen P = wsum[_N] * 1 / 100

. assert abs(wsum - P) > 0.1

.
. gen gt = (wsum > P)

. qui sum x if gt

. disp `r(min)'
997

.
. _pctile x [aw = w], p(1)

. return list

scalars:
                 r(r1) =  996.5

I'm not entirely sure why it gives that answer. It seems incorrect:

Code:

. format %21.5f wsum P

. disp P
49850.559

. qui count if wsum < P

. list in `=r(N) - 1' / `=r(N) + 1'

     +-------------------------------------------------+
     |        w     x          wsum             P   gt |
     |-------------------------------------------------|
995. |  22.8744   995   49817.42578   49850.55859    0 |
996. | 32.89221   996   49850.32031   49850.55859    0 |
997. | 25.59445   997   49875.91406   49850.55859    1 |
     +-------------------------------------------------+

This shows that the first x such that the cumulative weighted sum is is larger than P is 997; however, Stata seems to think that this requires averaging with the prior observation, which should only happen if 49850.32031 is equal to 49850.55859.

I thought this might be a numerical precision issue, but I get the same result using doubles (the cum sum is 49850.31956 and P is 49850.56176). It is interesting to note that collapse gives what I understand to be the right answer:

Code:

. collapse (p1) x [aw = w]

. disp x
997

Any insights?

Tags: None

Announcement

Weights in pctile