I recently found that Mata's mean function is very slow when called on a long loop, and prohibitively so on Windows. I initially thought this might be due to mean() doing computations in quad precision, but quadsum() doesn't seem to have this issue. Here is an example of what I mean:

On Windows

On Linux

I use Stata MP 15.1 on both systems. You can see that doing the mean "by hand" with either sum() or quadsum() takes significantly less time. The difference in Linux is 3x whereas on WIndows it's close to 20x. Does anyone have an insight on why this is happening?

Code:

mata real scalar function my_mean (real vector x) { return(sum(x) / (length(x) - missing(x))) } real scalar function my_quadmean (real vector x) { return(quadsum(x) / (length(x) - missing(x))) } n = 500000 x = runiform(n, 1) x[selectindex(x :< 0.1)] = J(sum(x :< 0.1), 1, missingof(x)) ixl = (1::n) :- 100 ixu = (1::n) :+ 100 ixl[selectindex(ixl :< 1)] = J(sum(ixl :< 1), 1, 1) ixu[selectindex(ixu :> n)] = J(sum(ixu :> n), 1, n) y1 = J(n, 1, .) y2 = J(n, 1, .) y3 = J(n, 1, .) end timer clear timer on 1 mata for (i = 1; i <= n; i++) { y1[i] = mean(x[|ixl[i] \ ixu[i]|]) } end timer off 1 timer on 2 mata for (i = 1; i <= n; i++) { y2[i] = my_mean(x[|ixl[i] \ ixu[i]|]) } end timer off 2 timer on 3 mata for (i = 1; i <= n; i++) { y3[i] = my_quadmean(x[|ixl[i] \ ixu[i]|]) } end timer off 3 mata assert(all(y1 :== y3))

Code:

. timer list 1: 90.75 / 1 = 90.7530 2: 4.57 / 1 = 4.5670 3: 4.85 / 1 = 4.8490

Code:

. timer list 1: 6.11 / 1 = 6.1070 2: 1.76 / 1 = 1.7610 3: 1.94 / 1 = 1.9430

## Comment