Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Equivalent of NumPy's multinomial()

    Dear All,

    I would like to inquire if there is already a Mata code equivalent to producing the multinomial random values as done in NumPy's multinomial() ?

    If not, I guess the best way would be to follow this 2008's advice by Nick Cox or explore the [undocumented] rdiscrete() function hinted at in the post by Kai Arzheimer.

    Thank you, Sergiy Radyakin


  • #2
    OK, building on top of the rdiscrete() the following seems to be working:

    Code:
    mata
    
      real rowvector multinomial(real N, real rowvector P) {
      
          if (N!=floor(N)) {
            printf("{error}Error! First parameter N must be integer, received: %g{break}",N)
            exit(error(3001))
          }
          
          s=sum(P)
          if (round(s,0.000001)!=1.00) {
            printf("{error}Error! Probabilities must sum to 1.0: but they sum to %10.6g{break}",s)
            exit(error(3001))
          }      
          
          result=J(1, cols(P), 0)
          
          for(i=1;i<=N;i++) {
            Z=rdiscrete(1,1,P)
            result[1,Z]=result[1,Z]+1
          }
          
          return(result)  
      }
      
    end
    
    mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
    mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
    mata multinomial(100,(1/3,1/2,1/6))

    Code:
    . 
    . mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
            1    2    3    4    5    6
        +-------------------------------+
      1 |  17   16   20   13   16   18  |
        +-------------------------------+
    
    . mata multinomial(100,(1/6,1/6,1/6,1/6,1/6,1/6))
            1    2    3    4    5    6
        +-------------------------------+
      1 |  15   16   17   20   14   18  |
        +-------------------------------+
    
    . mata multinomial(100,(1/3,1/2,1/6))
            1    2    3
        +----------------+
      1 |  30   49   21  |
        +----------------+

    Comment


    • #3
      Sergiy -- -rdiscrete()- is documented, though not heavily, in my v. 15 help. On the Stata side: I had occasion a few years ago to create programs to generate a multinomial variable in Stata, and my recollection was that -rdiscrete()- was not faster and perhaps even slower than Stata code if the goal was simply to create a single multinomial variable for N observations. My recollection is that the Stata command -irecode()- (very obscure) was the fastest way to go.
      Code:
      local P = 0.2 0.4 0.2 0.2
      ... code to cumulate P
      local F =    0.2, 0.6, 0.8, 1.0
      gen x = irecode(runiform(), `F')

      Comment


      • #4
        Dear Mike, thank you very much for mentioning that rdiscrete() is documented (indeed I have rechecked and found it described among the other distributions here) and for the advice on performance. Also thank you for bringing up irecode() - another very useful tool, that I use very rarely just because I keep on forgetting that it is already there.

        Comment


        • #5
          A quick simulation of these two methods shows that Mata's -rdiscrete()- is around 1.5x (2x at most) faster than -irecode()- on my machine running MP/4, for 30 simulations of 100,000,000 random observations (6 sec vs 9 sec). So unless one is simulating a very large dataset, it probably makes little practical difference which method is used.

          Comment

          Working...
          X