Equivalent of NumPy's multinomial()

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#1

Equivalent of NumPy's multinomial()

01 Jun 2020, 11:03

Dear All,

I would like to inquire if there is already a Mata code equivalent to producing the multinomial random values as done in NumPy's multinomial() ?

If not, I guess the best way would be to follow this 2008's advice by Nick Cox or explore the [undocumented] rdiscrete() function hinted at in the post by Kai Arzheimer.

Thank you, Sergiy Radyakin
Tags: None

Sergiy Radyakin

Join Date: Apr 2014
Posts: 1867

01 Jun 2020, 11:33

OK, building on top of the rdiscrete() the following seems to be working:

Code:

mata

  real rowvector multinomial(real N, real rowvector P) {
  
      if (N!=floor(N)) {
        printf("{error}Error! First parameter N must be integer, received: %g{break}",N)
        exit(error(3001))
      }
      
      s=sum(P)
      if (round(s,0.000001)!=1.00) {
        printf("{error}Error! Probabilities must sum to 1.0: but they sum to %10.6g{break}",s)
        exit(error(3001))
      }      
      
      result=J(1, cols(P), 0)
      
      for(i=1;i<=N;i++) {
        Z=rdiscrete(1,1,P)
        result[1,Z]=result[1,Z]+1
      }
      
      return(result)  
  }
  
end

mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
mata multinomial(100,(1/3,1/2,1/6))

Code:

. 
. mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
        1    2    3    4    5    6
    +-------------------------------+
  1 |  17   16   20   13   16   18  |
    +-------------------------------+

. mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
        1    2    3    4    5    6
    +-------------------------------+
  1 |  15   16   17   20   14   18  |
    +-------------------------------+

. mata multinomial(100,(1/3,1/2,1/6))
        1    2    3
    +----------------+
  1 |  30   49   21  |
    +----------------+

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2416
#3

01 Jun 2020, 22:34

Sergiy -- -rdiscrete()- is documented, though not heavily, in my v. 15 help. On the Stata side: I had occasion a few years ago to create programs to generate a multinomial variable in Stata, and my recollection was that -rdiscrete()- was not faster and perhaps even slower than Stata code if the goal was simply to create a single multinomial variable for N observations. My recollection is that the Stata command -irecode()- (very obscure) was the fastest way to go.

Code:

local P = 0.2 0.4 0.2 0.2 ... code to cumulate P local F = 0.2, 0.6, 0.8, 1.0 gen x = irecode(runiform(), `F')
1 like
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#4

02 Jun 2020, 11:36

Dear Mike, thank you very much for mentioning that rdiscrete() is documented (indeed I have rechecked and found it described among the other distributions here) and for the advice on performance. Also thank you for bringing up irecode() - another very useful tool, that I use very rarely just because I keep on forgetting that it is already there.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2403
#5

04 Jun 2020, 14:22

A quick simulation of these two methods shows that Mata's -rdiscrete()- is around 1.5x (2x at most) faster than -irecode()- on my machine running MP/4, for 30 simulations of 100,000,000 random observations (6 sec vs 9 sec). So unless one is simulating a very large dataset, it probably makes little practical difference which method is used.
Comment

Announcement

Equivalent of NumPy's multinomial()

Comment

Comment

Comment

Comment