Random number generation in Mata

John Mullahy

Join Date: Dec 2016
Posts: 743

Random number generation in Mata

25 Apr 2022, 07:21

In setting up some simulations I was surprised at the speed of some computations. For example, consider:

Code:

mata

time()

for (j=1;j<=50000;j++) {
 z=rnormal(10000,1,0,1)
}

time()

for (j=1;j<=50000;j++) {
 z=invnormal(runiform(10000,1))
}

time()

end

The results are:

Code:

:
: time()
  08:15:54 25 Apr 2022

:
: for (j=1;j<=50000;j++) {
>  z=rnormal(10000,1,0,1)
> }

:
: time()
  08:16:17 25 Apr 2022

:
: for (j=1;j<=50000;j++) {
>  z=invnormal(runiform(10000,1))
> }

:
: time()
  08:16:33 25 Apr 2022

:
: end

Does anyone have insights into why rnormal() is about 50% slower than invnormal(runiform()) in this example?

Last edited by John Mullahy; 25 Apr 2022, 07:29.

Tags: None

William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

25 Apr 2022, 09:16

When I read the entry for Random-number functions in the Stata Functions Reference Manual PDF documentation I see

rnormal(m,s)

Description: normal(m,s) (Gaussian) random variates, where m is the mean and s is the standard deviation
The methods for generating normal (Gaussian) random variates are taken from Knuth (1998, 122–128); Marsaglia, MacLaren, and Bray (1964); and Walker (1977).

which suggests to me that rnormal() does something other than using the invnormal() function to transform the basic uniform[0,1] pseudo-random number to one following a normal distribution.

More on computational methods for calculating the CDF and inverse CDF and generating pseudo-random variates for the normal distribution can be found in Wikipedia at

https://en.wikipedia.org/wiki/Normal...tional_methods
Comment

John Mullahy

Join Date: Dec 2016
Posts: 743

25 Apr 2022, 14:16

Thanks William. Interestingly the timing differences seen in my Mata example (#1 in this thread) do not appear to be so meaningful in Stata. For instance:

Code:

cap drop _all
set obs 10000000
tempvar y1 y2
qui gen `y1'=.
qui gen `y2'=.

timer clear

timer on 1
forval j=1/20 {
 qui replace `y1'=rnormal()
}
timer off 1

timer on 2
forval j=1/20 {
 qui replace `y2'=invnormal(runiform())
}
timer off 2

timer list

Results:

Code:

. timer list
   1:     16.11 /        1 =      16.1070
   2:     16.00 /        1 =      15.9960

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

25 Apr 2022, 15:48

Here are my results, trying as much as possible to compare apples to apples.

Code:

mata
rseed(666)

timer_clear()

timer_on(1)

for (j=1;j<=50000;j++) {
 z=rnormal(10000,1,0,1)
}

timer_off(1)
timer_on(2)

for (j=1;j<=50000;j++) {
 z=invnormal(runiform(10000,1))
}

timer_off(2)

end

set seed 666
set obs 10000
generate z = .

timer on 3

forvalues j=1/50000 {
    quietly replace z = rnormal()
}

timer off 3
timer on 4

forvalues j=1/50000 {
    quietly replace z = invnormal(runiform())
}

timer off 4

timer list

Code:

. timer list
   1:     21.83 /        1 =      21.8270
   2:     13.99 /        1 =      13.9930
   3:     48.68 /        1 =      48.6820
   4:     44.65 /        1 =      44.6520

When I reran it changing the Stata variable z to a double the results were

Code:

. timer list
   1:     20.73 /        1 =      20.7300
   2:     13.58 /        1 =      13.5810
   3:     44.88 /        1 =      44.8770
   4:     40.89 /        1 =      40.8900

which demonstrates good test-retest reliability on the Mata times, and presumably then the Stata times as well, with the time savings attributable to abandoning the conversion from double to float in storing the results.

Comment

John Mullahy

Join Date: Dec 2016

Posts: 743
#5

25 Apr 2022, 17:50

Thanks WIlliam. This is really instructive.

In addition it appears that some additional computational time may owe to the conversion of N(0,1) variates to N(m,s) variates. I don't know how rnormal() does its calculations but using your setup here's what's involved in using the invnormal(runiform()) approach. I suppose I expected the additional arithmetic transforming N(0,1) to N(m,s) to involve some additional time but I was surprised it was this much.

Code:

cap mata mata drop rnorm() rsnorm() mata function rnorm(r,c,m,s) return(m:+invnormal(runiform(r,c))*s) function rsnorm(r,c) return(invnormal(runiform(r,c))) rseed(666) timer_clear() timer_on(1) for (j=1;j<=50000;j++) { z=rnorm(10000,1,0,1) } timer_off(1) timer_on(2) for (j=1;j<=50000;j++) { z=rsnorm(10000,1) } timer_off(2) end timer list

Results:

Code:

. timer list 1: 20.97 / 1 = 20.9660 2: 15.69 / 1 = 15.6900
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3047

26 Apr 2022, 03:44

Here is another but related example in Stata, using the now depreciated function -uniform()-. It is not as extreme as the Mata anomaly, but it is still an anomaly. As William said these are different algorithms, and two things might be happening
1) The slower algorithm might be better, that is, generating variables which are more consistent with the distribution it is supposed to generate.
2) Or maybe Stata Corp put more thinking in the more often used -runiform()- as opposed to the more rarely used -rnormal()-, and optimised and sped up the first vs the second. :

Code:

. clear

. timer clear

. set obs 100000000
Number of observations (_N) was 0, now 100,000,000.

. timeit 1: gen double nu = 5 + 10*invnormal(uniform())

. timeit 2: gen double nru = 5 + 10*invnormal(runiform())

. timeit 3: gen double rnn = 5 + 10*rnormal()

. timeit 4: gen double rn = rnormal(5,10)

. summ

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          nu |100,000,000    4.999518    10.00051  -50.66586   61.10861
         nru |100,000,000    5.000736    10.00052  -49.53697   61.59014
         rnn |100,000,000    5.000599    9.999837  -53.16307   60.42452
          rn |100,000,000    4.999282    10.00091  -52.95512   60.96689

. timer list
   1:      5.84 /        1 =       5.8420
   2:      6.51 /        1 =       6.5060
   3:      6.66 /        1 =       6.6620
   4:      6.34 /        1 =       6.3440

. dis 5.8420/6.6620
.87691384

. dis 15.69 /20.97
.74821173

Comment

John Mullahy

Join Date: Dec 2016

Posts: 743
#7

26 Apr 2022, 11:36

To expand this thread a bit further I wanted to see how the inverse CDF approach worked with distributions other than normal. Since the simulations I'm working on involve normal and Student-t distributions I focused on a t-distribution with 2 d.f.

I used William Lisowski's setup in #4 (reducing the number of iterations from 50,000 to 500 for reasons that will be apparent when you see the results) as the basis of the following code

Code:

mata rseed(666) timer_clear() timer_on(1) df=2 iters=500 for (j=1;j<=iters;j++) { z=rt(10000,1,df) } timer_off(1) timer_on(2) for (j=1;j<=iters;j++) { z=invt(df,runiform(10000,1)) } timer_off(2) end set seed 666 set obs 10000 generate z = . loc df=2 loc iters=500 timer on 3 forvalues j=1/`iters' { quietly replace z = rt(`df') } timer off 3 timer on 4 forvalues j=1/`iters' { quietly replace z = invt(`df',runiform()) } timer off 4 timer list

Results:

Code:

. timer list 1: 0.40 / 1 = 0.3950 2: 73.22 / 1 = 73.2180 3: 0.73 / 1 = 0.7320 4: 75.07 / 1 = 75.0670

The mechanics of computing CDF inverses evidently differ considerably depending on the particular distribution (at least normal vs. t). This is probably a well known result among those who understand the computational details but it had not occurred to me before.
4 likes
Comment

Announcement

Random number generation in Mata

Comment

Comment

Comment

Comment

Comment

Comment