Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unexpected behavior (bug?) in -set seed- with version control

    Dear Statalisters,
    I came across a weird behavior of the random number generator under version control. (The following I observed using Stata 17 on a Windows machine.)

    Code:
    . version 17
    
    . version 13: set seed 27072023
    . di runiform()
    .38472837
    
    . version 13: set seed 27072023
    . di runiform()
    .27057198
    Even though I specify the identical seed under the same versions, runiform() returns different values. Stata effectively ignores the seed. The same happens when I set the version statements the other way round:
    Code:
    . version 13
    
    . version 17: set seed 27072023
    . di runiform()
    .39999195
    
    . version 17: set seed 27072023
    . di runiform()
    .08663614
    The problem does not arise as long as all version statements either refer to a version before or after the introduction of the random-number generator in Stata 14:
    Code:
    . version 12
    
    . version 13: set seed 27072023
    . di runiform()
    .39999195
    
    . version 13: set seed 27072023
    . di runiform()
    .39999195
    Code:
    . version 16
    
    . version 14: set seed 27072023
    . di runiform()
    .25052039
    
    . version 14: set seed 27072023
    . di runiform()
    .25052039
    Is there an explanation for this behavior?
    https://www.kripfganz.de/stata/

  • #2
    The default PRNG algorithm was changed to the Marsenne Twister in version 14.

    Comment


    • #3
      I know that, but this does not explain why Stata ignores the seed in the above examples.
      https://www.kripfganz.de/stata/

      Comment


      • #4
        By prefixing a command line, I understand that to mean that Stata executes the single line.

        Code:
        version 17
        version 13: set seed 27072023
        version 13: di runiform()
        
        version 13: set seed 27072023
        version 13: di runiform()
        Res.:

        Code:
        . version 17
        
        . version 13: set seed 27072023
        
        . version 13: di runiform()
        .39999195
        
        .
        . version 13: set seed 27072023
        
        . version 13: di runiform()
        .39999195

        Comment


        • #5
          Sorry Sebastian, I was too hasty in my response and thought it was just an issue with PRNG algorithms.

          I don't think this is a bug, but it's a subtlety of trying to access a different PRNG. You would also need to place all commands that access random-number generation under version control.

          For example:

          Code:
          . version 17
          
          . version 13: set seed 27072023
          . version 13: di runiform()
          .39999195
          
          . version 13: set seed 27072023
          . version 13: di runiform()
          .39999195
          Code:
          . version 13
          
          . version 17: set seed 27072023
          . version 17: di runiform()
          .25052039
          
          . version 17: set seed 27072023
          . version 17: di runiform()
          .25052039
          Quoting from -help whatsnew13to14- (emphasis mine)

          15. New random-number generators (RNGs)

          Existing function runiform() now uses the 64-bit Mersenne Twister. runiform() produces uniformly distributed random numbers, and the functions providing random numbers for other distributions use
          runiform() in producing their results. Thus, all of Stata's RNGs are now based on the Mersenne Twister, too. Stata previously used KISS32 and still does under version control.

          [...]

          Choosing which RNG to use

          You are running Stata with version 14 set. You want values from rlogistic() but based on KISS32 rather than the version-14 default of Mersenne Twister. You could type

          . version 13: ... rlogistic() ...

          or you can just use new function rlogistic_kiss32() without resetting the version:

          . ... rlogistic_kiss32() ...

          That is, every RNG fcn() comes in three flavors: fcn(), fcn_mt64(), and fcn_kiss32().

          Functions fcn_mt64() and fcn_kiss32() are now considered the true names of the RNGs, but still, you will usually type fcn().

          That is because of another new feature:

          . set rng kiss32

          set rng kiss32 says that when you type fcn(), you mean fcn_kiss32(). You can set rng to kiss32, mt64, or default. That is how the meaning of fcn() is set. default means the default for the
          version. In version 14, the default is mt64. In version 13 and before, it is kiss32.

          Programmers: Ado-file code written under previous versions of Stata now use modern RNGs! You do not have to modify your ado-files. That is because how version is set for the RNGs has been
          modified. Users typing version at the command line or in a do-file set RNG's version, too. Ado-files setting version, however, do not change RNG's version! In ado-files, the RNG's version can
          be set by setting the user version if you wanted to set it, but you do not. See [P] version.

          Comment


          • #6
            Interesting. This is at odds with what is written on version control in the Stata Manual:
            Version control within a RNG is specified at the time the set seed command is given, not at the time the random-number generation function such as rnormal() is used.
            https://www.kripfganz.de/stata/

            Comment


            • #7
              Hmm, you're right. These two sources contradict each other (especially the remainder of -help version-). I'd report it to Stata Technical Services.

              Comment


              • #8
                Originally posted by Sebastian Kripfganz View Post
                Interesting. This is at odds with what is written on version control in the Stata Manual:
                That does not contradict

                Code:
                version 13
                set seed 27072023
                di runiform()
                Res.:

                Code:
                . version 13
                
                . set seed 27072023
                
                . di runiform()
                .39999195
                The implication is that if you include the version at the time of setting the seed, all subsequent commands are under version control. The issue I see in #1 is prefixing the version. In effect, the prefixed command line runs independent of all other commands in the sequence. Consider:

                Code:
                version 17
                display "`=substr("`c(rngstate)'", -10, 10)'"
                version 13: set seed 27072023
                display "`=substr("`c(rngstate)'", -10, 10)'"
                di runiform()
                display "`=substr("`c(rngstate)'", -10, 10)'"
                Res.:

                Code:
                . version 17
                
                . display "`=substr("`c(rngstate)'", -10, 10)'"
                0000042ecf
                
                . version 13: set seed 27072023
                
                . display "`=substr("`c(rngstate)'", -10, 10)'"
                0000042ecf
                
                . di runiform()
                .86893327
                
                . display "`=substr("`c(rngstate)'", -10, 10)'"
                0000052ed0

                Comment


                • #9
                  I disagree. The Stata Manual for the version command actually explicity provides an example for using the version prefix for set seed:
                  . version 11.1: set seed 123456789
                  . any_command ...


                  In this case, any command uses the older version of rnormal() because the seed was set under version 11.1
                  https://www.kripfganz.de/stata/

                  Comment


                  • #10
                    The example is incorrect, I agree. What matters ultimately is the default RNG version. I guess the example should read:

                    version 11.1
                    set seed 123456789
                    . any_command ...

                    In this case, any command uses the older version of rnormal() because the specified version is 11.1
                    Last edited by Andrew Musau; 27 Jul 2023, 11:12.

                    Comment


                    • #11
                      According to Stata's technical support team, the observed behavior described in my opening post is not a bug.

                      In my opinion, irrespective of which version I use to set the seed, the two calls to runiform() should yield identical results because I used the same seed.

                      Also, why do I get identical results if instead of repeating the same command lines twice, I write a loop?
                      Code:
                      . version 17
                      
                      . forvalues i = 1 / 2 {
                        2.         version 13: set seed 27072023
                        3.         di runiform()
                        4. }
                      .25052039
                      .25052039
                      That should yield the same result as in the opening post, shouldn't it?

                      I will leave it at that, although I can still not make any sense of it.
                      https://www.kripfganz.de/stata/

                      Comment


                      • #12
                        FWIW, here is how I understand the issue:

                        The sequence

                        Code:
                        version 17
                        version 13: set seed 27072023
                        di runiform()
                        in a do-file or interactively, sets the seed for the kiss32 generator to 27072023 but (given rng is set to default) still obtains the random number from the (implied) version 17 default runiform_mt64(). If you code

                        Code:
                        version 17
                        version 13: set seed 27072023
                        di runiform_kiss32()
                        you will obtain reproducible results.


                        The referenced example in [P] version refers to programs or ado-files:

                        Code:
                        program my_runiform
                            
                            version 17
                            
                            version 13: set seed 27072023
                            
                            di runiform()
                            
                        end
                        
                        my_runiform
                        my_runiform
                        The code above will yield reproducible results. It is the same result that you get from the loop in #11, and I believe this is because the loop is internally considered a program.


                        btw the example here might be further complicated by changes specific to runiform(); from

                        Code:
                        help version
                        If you set version to less than 14.0

                        [...]

                        2. If the current RNG is kiss32 and the kiss32 seed has been set to version 13.1 or earlier, then runiform() generates a random variate on the
                        interval [0,1). As of Stata 14, runiform() generates random variates on the interval (0,1) for all RNGs.
                        (emphasis mine)
                        Last edited by daniel klein; 01 Aug 2023, 07:49.

                        Comment


                        • #13
                          Thank you, daniel klein
                          I think, I understand the logic now.

                          set seed determines the RNG variant based on what is called the "user version". The user version is the version specified in a do-file or interactively; or it is the current Stata version if no version statement is provided. If the seed is set under user version 13, this only effects the kiss32 RNG. Because runiform() is called under version 18, it looks for a seed to the mt64 RNG, which has not been set.

                          To achieve a similar effect within a program or a loop, we would need to add the option user to the version statement:
                          Code:
                          . version 17
                          
                          . forvalues i = 1 / 2 {
                            2.         version 13, user: set seed 27072023
                            3.         di runiform()
                            4. }
                          .75054449
                          .94849832
                          This makes some sense now.
                          https://www.kripfganz.de/stata/

                          Comment

                          Working...
                          X