This may be widely known, but in case not I thought I would share...

Stata has several commands that compute percentiles:

and perhaps others.

It turns out that these do not always yield the same results, apart from the median or 50th percentile. For example this code:

gives these results:

There is nothing surprising about this if one reads carefully the respective "Methods and Formulas" sections in each command's documentation, as

Yet the differences may be nontrivial in some contexts (e.g. computation of IQRs), so it is perhaps worth considering which of the competing formulae squares most closely with how the researcher conceives of percentiles.

Stata has several commands that compute percentiles:

**centile**

sum, d

_pctile

egen pctilesum, d

_pctile

egen pctile

and perhaps others.

It turns out that these do not always yield the same results, apart from the median or 50th percentile. For example this code:

Code:

preserve cap drop _all set obs 20 set seed 23 tempvar y gen `y'=exp(rnormal(0,1)) qui centile `y', c(10 25 50 75 90) di r(c_1) _n r(c_2) _n r(c_3) _n r(c_4) _n r(c_5) qui sum `y',d di r(p10) _n r(p25) _n r(p50) _n r(p75) _n r(p90) qui _pctile `y', p(10 25 50 75 90) di r(r1) _n r(r2) _n r(r3) _n r(r4) _n r(r5) drop _all restore

Code:

. preserve . cap drop _all . set obs 20 number of observations (_N) was 0, now 20 . set seed 23 . tempvar y . gen `y'=exp(rnormal(0,1)) . qui centile `y', c(10 25 50 75 90) . di r(c_1) _n r(c_2) _n r(c_3) _n r(c_4) _n r(c_5) .29993572 .38304436 1.6890243 2.8531529 5.1466236 . qui sum `y',d . di r(p10) _n r(p25) _n r(p50) _n r(p75) _n r(p90) .31345257 .40814352 1.6890243 2.7669318 5.0989532 . qui _pctile `y', p(10 25 50 75 90) . di r(r1) _n r(r2) _n r(r3) _n r(r4) _n r(r5) .31345257 .40814352 1.6890243 2.7669318 5.0989532 . drop _all . restore . end of do-file

**centile**uses a different formula than do the others.Yet the differences may be nontrivial in some contexts (e.g. computation of IQRs), so it is perhaps worth considering which of the competing formulae squares most closely with how the researcher conceives of percentiles.

## Comment