Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • State of Random Number Generator and programs

    Hi all,
    this is be a bit of a long and complicated post, but I am confused a little bit about a finding I cannot explain. Maybe (hopefully) my error is very basic and obvious (but not to me at the moment....) The post is interesting for Stata 15 and not relevant for Stata 14.

    I am using a program to set the type of the random number generator, seed and seed stream. My expectation was, that this sets the random number generator, stream and seed. However it does not. What I found is, that Stata "returns" to the default random number generator type after the program is finished. This implies that if I draw the same number of random numbers outside and within the program, the numbers differ.

    The following example shows it:
    Code:
    clear all
    clear rngstream
    
    capture program drop seed_checker
    program seed_checker
        syntax anything [,draw ]
            clear rngstream
            set rng mt64s
            set rngstream 1
            set seed `anything'        
    
            mata rng_int =( "`c(rng)'" \ "`c(rng_current)'")         
            
            if "`draw'" == "draw" {
                mata x_sc = rnormal(10,1,0,1)
            }
            mata rngstate_int = "`c(rngstate)'"
    end
    
    * Starting Example
    clear rngstream
    set seed 123
    mata x = rnormal(10,1,0,1)
    mata rng_start = ( "`c(rng)'" \ "`c(rng_current)'")
    mata state_start = "`c(rngstate)'"
    
    * Example 1
    seed_checker 123
    mata x_1 = rnormal(10,1,0,1)
    mata rng_after_1 =( "`c(rng)'" \ "`c(rng_current)'")
    mata rng_int_1 = rng_int
    mata state_int_1 = rngstate_int
    mata state_after_1 = "`c(rngstate)'"
    
    * Example 2
    seed_checker 123, draw
    mata rng_after_2 =( "`c(rng)'" \ "`c(rng_current)'")
    mata state_after_2 = "`c(rngstate)'"
    mata rng_int_2 = rng_int
    mata state_int_2 = rngstate_int
    mata x_2 = x_sc
    
    *by hand
    clear rngstream
    set rng mt64s
    set rngstream 1
    set seed 123   
    mata x_3 = rnormal(10,1,0,1)
    mata rng_3 =( "`c(rng)'" \ "`c(rng_current)'")         
    mata state_after_3 = "`c(rngstate)'"            
                
    **Check random draws
    mata (x , x_1, x_2, x_3)
    
    ** Check nrg and rng_current
    mata (rng_start , rng_after_1 , rng_int_1 , rng_after_2, rng_int_2,rng_3)
    
    **compare states:
    mata states = J(0,5,.)
    foreach st in state_start  state_after_1 state_after_2 state_int_2 state_after_3 {
        mata states = (states  \ (`st':== (state_start ,state_after_1 ,state_after_2 ,state_int_2 ,state_after_3)))
    }
    mata names = ("state_start" , "state_after_1" ,"state_after_2" ,"state_int_2" ,"state_after_3")
    ** RNG States:
    mata (("", names )\ (names' , strofreal(states)))
    The program seed_checker set a type of the random number generator, the stream and a seed. Then it saves those in mata matrices and - if requested - draws 10 random numbers. It results in the following
    Code:
    . **Check random draws
    . mata (x , x_1, x_2, x_3)
                       1              2              3              4
         +-------------------------------------------------------------+
       1 |   .4907676602   -.3268441812    .4907676602    .4907676602  |
       2 |  -1.405081998   -1.213826865   -1.405081998   -1.405081998  |
       3 |  -.2040506142   -.0088207356   -.2040506142   -.2040506142  |
       4 |    2.10972645    1.626667208     2.10972645     2.10972645  |
       5 |   .4460774109   -1.296418109    .4460774109    .4460774109  |
       6 |  -.3231870725    1.052054144   -.3231870725   -.3231870725  |
       7 |   .7594250051    .9918897139    .7594250051    .7594250051  |
       8 |  -1.257190365    .1931405811   -1.257190365   -1.257190365  |
       9 |  -.7144663077   -.0311804972   -.7144663077   -.7144663077  |
      10 |     .11753014   -.2283643257      .11753014      .11753014  |
         +-------------------------------------------------------------+
    x is drawn at the very beginning. Only x_2 is drawn within the program. x_1 is drawn after the program is called and x_3 is drawn doing the same as the program does, but outside of it.

    I do understand that the results in x and x_1 differ, because I set a different random number generator type and stream (mt64s instead of mt64). What I don't understand is, that x and x_2 are the same, but not x_1 and x_2. The difference between x_1 and x_2 is, that x_2 is drawn inside the program, x_1 after the program. Also I don't understand that x and x_2 are the same as I changed from mt64 to mt64s. Then I though I check the type of the random number generator:
    Code:
    ** Check nrg and rng_current
    . mata (rng_start , rng_after_1 , rng_int_1 , rng_after_2, rng_int_2,rng_3)
                 1         2         3         4         5         6
        +-------------------------------------------------------------+
      1 |  default   default     mt64s   default     mt64s     mt64s  |
      2 |     mt64      mt64     mt64s      mt64     mt64s     mt64s  |
        +-------------------------------------------------------------+
    The first column refers to the very beginning, column 2 is from the first example after calling the program and column 3 within the program. Thus it looks like the random number generator within and after the program is different. Why is this the case? The same is true for the 2nd example. The numbers drawn in example 2 are the same as in the starting example. They are the same (and the same as in the "by hand case"), but the type of the generators are different. What is the rational behind this?

    When checking the random number states after drawing, it turns out that only the ones after example 1 and 2 are the same and the internal one from example 2 and the one for example 3 ("by hand") are the same.
    Code:
    . ** RNG States should be the same:
    . mata (("", names )\ (names' , strofreal(states)))
    [symmetric]
                       1               2               3               4               5               6
        +-------------------------------------------------------------------------------------------------+
      1 |                                                                                                 |
      2 |    state_start               1                                                                  |
      3 |  state_after_1               0               1                                                  |
      4 |  state_after_2               0               1               1                                  |
      5 |    state_int_2               0               0               0               1                  |
      6 |  state_after_3               0               0               0               1               1  |
        +-------------------------------------------------------------------------------------------------+
    Once again the rng_state after 1 and 2 are the same, but the random numbers are different? The only bit which makes sense to me is that the states for example 2 (drawn inside the program) and for the "by hand" example are the same.

    To sum up, my questions are:
    • Why does the type of the random number generator cannot be set with a program?
    • Why are the random numbers the same if I switch from mt64 to mt64s, but different if I set the type within a program, despite using the same rng type?
    • Why does this matter for the rng states?

  • #2
    JanDitzen,

    Your analysis of the behavior of our random-number generators (RNGs) is very thorough and accurate. There are two key design choices we made that will hopefully address the confusion and explain your observations.

    First, you are correct in asserting that when Stata returns from a defined program, it restores the RNG to the one before the program is called. This is the intended behavior. As per our documentation (help rng),

    The scope of set rng is the Stata session, do-file, or program in which rng is set.
    The rationale behind this is as follows: if your code calls other independent programs (like user-written ones), we do not want the called program to change the RNG beyond its scope. This may have unintended consequences. For example, say you have code that is intended to use the default RNG (mt64). At some point it calls an old program (or ado-file) that happens to use kiss32 (the default RNG before Stata 14), maybe to reproduce older results. If we didn't make the design choice we did, when control is returned to the original code, you would suddenly be getting kiss32 numbers instead of mt64 numbers.

    Second, when we introduced mt64s, the stream version of the Mersenne Twister RNG, in Stata 15, we allowed for over 30,000 independent streams. However, we chose the first stream (stream 1) to have the same starting point as the default RNG, mt64. As we shall see later, it explains the results you are getting. This behavior is documented (help rngstream), but we shall look into making it clearer in our documentation.

    Given these two features of our RNGs, your observations can be explained:

    1. x and x_1 are expectedly different. However, it is not because they are generated by different RNGs. Both x and x1 are generated by the default RNG i.e. mt64 (and not by mt64s, which is invoked only within seed_checker). If you run

    Code:
    set rng mt64
    set seed 123
    mata y = rnormal(20,1,0,1)
    mata y
    you will observe that the 20 random numbers in y are the 10 numbers in x followed by the 10 numbers in x_1.

    2. x and x_2 are the same because x is generated by mt64 and x_2 is generated by mt64s with stream 1. As mentioned, we chose stream 1 to have the same starting point as mt64 i.e. numbers drawn from mt64s with stream 1 will be the same as those from mt64, given the same seed.

    3. x_1 and x_2 are different because x_1 is the second set of 10 numbers generated by mt64 and x_2 is generated by mt64s (which is the same as the first 10 numbers generated by mt64 because they are from stream 1).

    4. x_3 is generated by mt64s with rngstream set to 1, and that's why the numbers are the same as x and x_2.

    5. The output you get on checking c(rng) and c(rng_current) is as expected: outside the scope of check_seed, the current RNG is mt64. Within the scope of check_seed, it is mt64s. When you set the RNG to mt64s outside the scope of check_seed, the current RNG is mt64s.

    6. Regarding the states, state_after_1 and state_after_2 are both mt64 states computed outside the scope of seed_checker (default RNG is mt64). In your code, no numbers are drawn from mt64 between

    Code:
    mata state_after_1 = "`c(rngstate)'"
    and

    Code:
    mata state_after_2 = "`c(rngstate)'"
    So the mt64 state remains unchanged. That's why state_after_1 and state_after_2 are the same.

    I hope this is helpful. As mentioned, we will look into changing our documentation to make the design clearer.

    -- Kreshna

    Comment


    • #3
      Kreshna Gopal, thank you very much for your detailed answer. Indeed your explanation makes totally sense and I misinterpreted the quoted reference of the documentation.

      I have three follow up questions and unfortunately it becomes complicated again. If this discussion is not of public interest, I would be more than happy to continue it somewhere else or read up on it.

      First, following your line of thinking, would it not be consistent to restore the seed as well? I once used a user written command, which internally set a seed and this lead to some very interesting Monte Carlo results (See Note 2 below). Why is the random number generator restored, but not the seed? Furthermore, it seems the stream can be set in a program. Is this then equivalent to setting a different seed?

      Secondly, is there any scope of adding an option set rng [, permanent] or set rng [, session], which allows to set the rng type for the running session (i.e. until Stata is closed) or permanently. Alternatively programs can be defined as being allowed to change the rng type (I know this might be against the definition of programs in Stata though). The background of my question is the following: I am working on an update for multishell (https://www.statalist.org/forums/forum/general-stata-discussion/general/1456028-new-package-on-ssc-multishell). In a nutshell, multishell parallelises loops, starting a shell for each variation of the loop. For this I need carefully set the seeds (and rng types) to make sure sequences of random numbers are not repeated. Testing it I observed the behaviour mentioned above of Stata 15, the behaviour in Stata 14 is different because only the seed is set. To implement different seeds, I am using a program to set the seed and in the case of Stata 15 the stream. Thus it would be handy if I could specify the program such that it can change the seeds for the remaining time of the Stata session.

      Finally, I am still confused about the consequences of rngstream (and this has actually to do with my point 1 and the application from point 2) and the differentiation between mt64 and mt64s, respectively kiss32. As pointed out above, I can set the stream number in a program and Stata does not restores the stream number. This has an effect on the seed (see Note 1 below), but Stata restores the rng type back to mt64 (or kiss32). However the stream number is different. What confuses me is, that later on in the documentation of set rngstream it reads:
      The mt64s generator is a stream version of Stata's default generator,....
      My understanding is, that when I invoke set stream 2 (for example), then my rng changes to mt64s. I cannot change the rng using a program, but I can change the seed and the stream number?
      The documentation for set rngstream reads:
      set rngstream also sets the random-number generator to mt64s.
      Does this not imply, when using set rngstream, the rng is automatically mt64s? When I change the stream within a program, which rng type (and state) is Stata using outside the program?

      The following code shows this without the numbers being drawn and with the kiss32 rng:
      Code:
      clear all
      clear rngstream
      
      capture program drop set_rngstream_only
      program set_rngstream_only
          syntax anything [,draw ]
             set rngstream `anything'
             mata state = "`c(rngstate)')"
             mata rng = ("`c(rngstream)'"\"`c(rng_current)'"\"`c(rng)'")
      end
      
      
      capture program drop set_seed_only
      program set_rngstream_only
          syntax anything [,draw ]
             set seed `anything'
             mata state = "`c(rngstate)')"
             mata rng = ("`c(rngstream)'"\"`c(rng_current)'"\"`c(rng)'")
      end
      
      set rng kiss32
      set seed 123
      
      mata state_1 = "`c(rngstate)')"
      mata rng_1 = ("`c(rngstream)'"\"`c(rng_current)'"\"`c(rng)'")
      
      set_rngstream_only 2
      mata state_2 = "`c(rngstate)')"
      mata rng_2 = ("`c(rngstream)'"\"`c(rng_current)'"\"`c(rng)'")
      
      mata state_1 :== state
      mata state_2 :== state
      mata state_1 :==state_2
      mata (rng_1,rng,rng_2)
      
      set_seed_only 456
      mata state_3 = "`c(rngstate)')"
      mata rng_3 = ("`c(rngstream)'"\"`c(rng_current)'"\"`c(rng)'")
      mata state_3:==state
      mata state_2:==state
      mata (rng_2,rng,rng_3)
      Output is:
      Code:
      mata state_1 :== state
        0
      
      . mata state_2 :== state
        0
      
      
      . mata state_1 :==state_2
        1
      .
      
      . mata (rng_1,rng,rng_2)
                  1        2        3
          +----------------------------+
        1 |       1        2        2  |
        2 |  kiss32    mt64s   kiss32  |
        3 |  kiss32    mt64s   kiss32  |
          +----------------------------+
      I understand that state_1 and state_2 are the same and different from state. Also the switching of the rngs makes sense. But from which rng type would my random numbers now origin when drawn outside of the program? mt64, mt64s or kiss32? Initally I would say kiss32, but I set the random number stream to 2 and this would mean from mt64s?

      Also my interpretation of the documentation is, that using set rngstream changes the rng. While the former is not restored after a program closes, the latter is. But both depend on each other. Is this not inconsistent?

      Once again, thanks for your detailed answer and I hope this question does not miss out some essential parts.


      Note 1:
      The documentation about rngstream reads:
      A stream random-number generator partitions a sequence of random numbers into nonoverlapping subsequences known as streams.
      and this is nicely shown in Figure 3 in the set rngstream pdf documentation. To put it differently (or very naively), set rngstream divides the current seed stream into substreams, this is done by setting a different seed for each stream.

      Note 2:
      The output to see that the seed can be set in a program is:
      Code:
      . mata state_3:==state
        1
      . mata state_2:==state
        0
      . mata (rng_2,rng,rng_3)
                  1        2        3
          +----------------------------+
        1 |       2        2        2  |
        2 |  kiss32   kiss32   kiss32  |
        3 |  kiss32   kiss32   kiss32  |
          +----------------------------+
      Thus, the seed can be set in a program and remains different afterwards.

      Comment


      • #4
        Very interesting discussion. Please don't take it private!

        One minor comment: I am always shocked when people set a seed within a program. However, I can't imagine it's ever intentional (if anyone has ever come across a reasonable justification I would be interested). More likely that it's left over from when someone was developing the program. So do write to authors of these programs – it's not a hard thing to fix.

        Tim

        Comment


        • #5
          JanDitzen, thank you for the feedback.

          Regarding

          First, following your line of thinking, would it not be consistent to restore the seed as well? I once used a user written command, which internally set a seed and this lead to some very interesting Monte Carlo results (See Note 2 below). Why is the random number generator restored, but not the seed?

          Please bear in mind that each of Stata's RNGs (kiss32, mt64, mt64s) has its own seed. set seed sets the seed for the current RNG only. If the current RNG is, say mt64, and the called program uses a different RNG (say kiss32), then on exiting, mt64 is restored and its seed/state are unchanged and effectively restored. However, if the called program calls the same RNG (mt64), the original state/seed is not restored on exiting (as you rightly pointed out). This is deliberate. Resetting the seed would imply that the same numbers may be generated by the called program and after the program is called. That means the numbers after exiting the program are not random anymore. That may not be the intention and results can be very misleading.

          If the intention is to restore to a previous point for a given RNG, we can use c(rngstate). In fact, the main purpose of keeping the RNG state to be able to go back to a point where we left off in the RNG sequence. For example:

          Code:
          program seed_checker
                  set rng kiss32
                  mata x_rc = rnormal(5,1,0,1)
          end
          
          set rng mt64
          set seed 123
          local state = c(rngstate) /* preserve the state */
          mata x_rc = rnormal(5,1,0,1)
          mata x_rc
          
          seed_checker
          
          set rngstate `state'      /* restore the state  */
          mata x_rc = rnormal(5,1,0,1)
          mata x_rc
          Tim Morris makes a valid argument against setting the seed repeatedly. In fact, help seed has a detailed section titled "Do not set the seed too often".

          Regarding

          Furthermore, it seems the stream can be set in a program. Is this then equivalent to setting a different seed?
          Yes, both seed and rngstream can be set in a program, and the setting applies beyond the scope of the program (unlike rng).

          Setting a different stream is not equivalent to setting a different seed. True, both will produce different random numbers, but if you need truly independent sequences, say for Monte Carlo simulation, it is best to use different streams. Generating sequences of random numbers with different seeds does not guarantee independence. In fact, this motivated our implementation of mt64s -- our approach (partitioning the Mersenne Twister sequence) guarantees the independence of the streams.


          Secondly, is there any scope of adding an option set rng [, permanent] or set rng [, session], which allows to set the rng type for the running session (i.e. until Stata is closed) or permanently.
          We will look into adding the permanent and session options you mentioned. But there are ramifications that we have to be careful about. Overriding any set rng may have unintended consequences. It may be useful to set rng in profile.do.

          Thank you for the multishell package. mt64s was introduced with this kind of parallelization in mind i.e. provide truly independent streams.

          To use mt64s safely and effectively, as our documentation states,

          We strongly recommend that you set the stream and the seed once in each Stata session and draw numbers only from this stream.
          This may be useful: we provide the rngstream# parameter when running Stata in batch mode.

          When you have different RNGs, different seeds, different streams, and the ability to restore states, caution is needed to make sure you are actually drawing truly random numbers from where you are expecting. We would recommend that multishell uses mt64s, sets the seed once, and use a single stream for each parallel simulation. If you are using Stata 14 (and earlier), a seed() option will be useful, although not an ideal solution for generating streams.


          My understanding is, that when I invoke set stream 2 (for example), then my rng changes to mt64s. I cannot change the rng using a program, but I can change the seed and the stream number?
          Yes indeed, you cannot change the RNG "permanently" using a program (or ado-file) but you can change the seed and the stream. Not restoring the seed is justified (as discussed above). There is, arguably, no compelling reason to restore the stream as well.

          Does this not imply, when using set rngstream, the rng is automatically mt64s? When I change the stream within a program, which rng type (and state) is Stata using outside the program?
          Yes, when you set rngstream, the current RNG is automatically set to mt64s. If you set the stream within a program, then outside the program, the original RNG is restored. It will generate numbers based on the current state of that RNG (including the current stream if the current RNG is mt64s).

          There seems to be typo in your code. set_rngtream_only is defined twice. The second one is set_seed_only, I presume. This is the output I get by changing the second definition of set_rngtream_only to set_seed_only in your code (in a fresh Stata session).

          Code:
          . mata state_1 :== state
            0
          
          . mata state_2 :== state
            0
          
          . mata state_1 :==state_2
            1
          
          . mata (rng_1,rng,rng_2)
                      1        2        3
              +----------------------------+
            1 |       1        2        2  |
            2 |  kiss32    mt64s   kiss32  |
            3 |  kiss32    mt64s   kiss32  |
              +----------------------------+
          
          .
          . set_seed_only 456
          
          . mata state_3 = "`c(rngstate)')"
          
          . mata rng_3 = ("`c(rngstream)'"\"`c(rng_current)'"\"`c(rng)'")
          
          . mata state_3:==state
            1
          
          . mata state_2:==state
            0
          
          . mata (rng_2,rng,rng_3)
                      1        2        3
              +----------------------------+
            1 |       2        2        2  |
            2 |  kiss32   kiss32   kiss32  |
            3 |  kiss32   kiss32   kiss32  |
              +----------------------------+

          Some comments:

          1. Yes, state_1 and state_2 are the same (kiss32) states and state in the body of set_rngstream_only is an mt64s state, because yes, set rngstream automatically sets the RNG to mt64s.

          2. After calling,

          Code:
          set_rngstream_only 2
          rng is restored back to kiss32. So even if the stream has been changed in set_rngstream_only, we are generating kiss32 numbers. You always draw from c(rng_current).

          Also my interpretation of the documentation is, that using set rngstream changes the rng. While the former is not restored after a program closes, the latter is. But both depend on each other. Is this not inconsistent?
          Yes, after

          Code:
          set_rngstream_only 2
          the RNG is restored to kiss32 and the stream is 2. It is not inconsistent. You are drawing from c(rng_current) i.e. kiss32. If you do set rng mt64s, then you will go back to stream 2, based on the seed set for mt64s.

          In general, the three different RNGs are independent, with independent seeds, states, and sequences. set rngstream applies to mt64s only. set seed, c(rngstate), and all RNG functions apply to the current RNG. If you move back and forth among the RNGs, you pick up where you left off in their sequences. The entry point to their sequences can be reset with set seed.

          I hope this clears your doubts. I would emphasize on an important point: mt64s was designed so that you set the mt64s seed once and then use one new stream for every simulation.

          Regards,

          -- Kreshna




          Comment


          • #6
            Dear Kreshna,
            thank you very much for taking the time to provide these very detailed answers. What kept confusing me was that r(rngstream) is 1 (or takes on other values) independent of the random number generator.

            In any case, these explanations helped a lot understanding some of the details of Stata's random number generator. It was very helpful for the next version of multishell.

            Regards,
            Jan

            Comment

            Working...
            X