Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is the relationship between a seed and a state of the random number generator?

    I want to generate N, say N=10, random numbers, then turn off Stata, start a new instance of Stata, generate another set of N random numbers, and so on. I repeat the process R times, say R=2.

    I want my R sets of N random numbers to be independent/uncorrelated. I want them to be as unrelated to each other as a pseudo random number sequences can be.

    How should I be setting the seed in each of my R sequences to achieve this task?

    I understand that if I set the seed to the same number, say 1, at the start of each of the R sequences, I will get R replicas of the same set of N pseudo random numbers.

    What I do not understand is what happens if I set the seed to 1, generate 10 random numbers, then set the seed to 5, and generate second set of 10 random numbers. Would 5 of the random numbers in the two sequences overlap? It does not appear to be the case:

    Code:
    . clear
    
    . set obs 13
    number of observations (_N) was 0, now 13
    
    . set seed 10
    
    . gen u = runiform()
    
    . list
    
         +----------+
         |        u |
         |----------|
      1. | .6012831 |
      2. | .9137043 |
      3. | .2667319 |
      4. | .6073658 |
      5. | .0360737 |
         |----------|
      6. |   .75318 |
      7. | .4591727 |
      8. | .9056994 |
      9. | .3134366 |
     10. | .3755541 |
         |----------|
     11. | .1387824 |
     12. | .9243277 |
     13. | .3676381 |
         +----------+
    
    . clear
    
    . set obs 3
    number of observations (_N) was 0, now 3
    
    . set seed 20
    
    . gen u = runiform()
    
    . list
    
         +----------+
         |        u |
         |----------|
      1. | .7156579 |
      2. | .9087018 |
      3. | .7972254 |
         +----------+
    So I can see that the 3 numbers that should overlap are not the same. But are they independent/uncorrelated as much as pseudo random number can be?




  • #2
    I read this post here: https://blog.stata.com/2016/03/10/ho...bers-in-stata/
    which is supposed to discuss the issue, and I did not understand anything.

    What I understand from the post is that the seed, and the state of the random numbers are different things. What I cannot understand is how are they related to each other, and how can I answer my question in #1.

    Comment


    • #3
      Bill Gould has given more detailed explanations on random number generators, random number seeds, and how to use the latter, e.g., here. You might also be interested in setrngseed (SSC). Perhaps this helps?

      Comment


      • #4
        Perhaps the discussion in the Random Number Functions section of the Stata Functions Reference Manual PDF, specifically the Technical Note about how to restart a RNG from its current spot, will clarify the difference between the seed and the state.

        In particular you should conclude that two seeds that seem similar (e.g. 1 and 5) produce sequences that are not likely to be similar unless, perhaps, both sequences are used to generate a significant portion of the full cycle of the pseudo-random number generator in use (which for the mt64 RNG is 219937-1 ≅ 4.315 × 106001 so be prepared to wait a while).

        Comment


        • #5
          Joro Kolev, the best approach to ensure that you generate R independent sets of N random numbers is to use Stata's stream random number generator (RNG). help set rngstream has all the details. We introduced the stream RNG in Stata 15 to address the issue you raised: guarantee that random numbers drawn are truly independent in different runs, which is very useful in tasks like Monte Carlo simulations, especially when done in parallel. Here is code to generate independent random numbers on different machines (maybe in parallel), or in different Stata sessions on the same machine, or sequentially in the same Stata session.

          Code:
          // Choose mt64s, the stream RNG (Stata has 3 RNGs)
          set rng mt64s
          
          // Use a different stream for each run
          set rngstream 10
          
          // Use the same seed for all runs
          set seed 123
           
          generate r = runiform()
          We recommend that the same seed is used for all runs, but use a different stream for each run. Stata allows up to 32,767 independent streams of random numbers. Note that stream 1 is the same as the default RNG of Stata, the 64-bit Mersenne Twister (mt64). mt64s is the stream version of the 64-bit Mersenne Twister, where the long sequence of mt64 is partitioned into non-overlapping streams.

          If you are using a version of Stata older than 15, here are two alternative solutions -- I will use them to try to explain the relationship between seed and state, and how RNGs in Stata work in general.

          The first solution is to use a different seed for each run, as Joro Kolev suggested. But he very rightly pointed out that this approach provides no guarantee of independence of the random numbers drawn. They may indeed overlap. In Stata 14, we added the 64-bit Mersenne Twister (mt64) RNG and made it the default. One of the motivation was, as William Lisowski alluded to, its extremely long sequence, which makes overlaps very unlikely (but still a possibility). Also, as William Lisowski mentioned, seeds close to each other should land in different spots in the sequence (at least for a well-designed RNG). Nonetheless, setting different seeds is not recommended if you want to guarantee uncorrelated, independent sets of random numbers.

          The second solution is to set the seed only once, and preserve the state of the RNG at the end of each run -- and then restore the state for the next run and continue drawing (independent) random numbers from where we left off in the same sequence. This solution is applicable for sequential runs only. Note that pseudo RNGs, like mt64, typically work this way: it draws from a cyclical, non-repeating sequence of numbers based on an iterative formula. The seed determines the starting point in the sequence. The state contains information about where you are in the sequence (the state is, in Stata, a string of characters about 5,000 long for mt64 and mt64s). You can preserve the state of an RNG with

          Code:
          local state = c(rngstate)
          and restore it with

          Code:
          set rngstate `state'
          The state of the RNG can also be saved in a dataset, if you have many runs. States can be preserved and restored for all three Stata RNGs (mt64, the current default, mt64s, and the older kiss32).

          For more information, do help set seed, help set rng, and help set rngstream.

          I hope this is helpful.

          -- Kreshna
          Last edited by sladmin; 23 Dec 2020, 11:40. Reason: typo/clarity

          Comment

          Working...
          X