Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • saving the rngstate of each replication in a monte carlo simulation


    Hello,

    Using Stata 14, I am simulating an r-class program and would like to save the rngstate of each replication of the simulation. Any help would be greatly appreciated!


    Jessica

  • #2
    You don't say whether you want to save the rng state as of the start of each replication or as of the finish. I'll assume you want the start. Then you can just put:

    Code:
    return scalar rng_state = c(rngstate)
    in the beginning of your rclass program's code (before you actually generate any random numbers.

    That will put the state of the RNG into the list of things your program returns, and then you can record it in your ongoing results with the other things your program returns.

    Comment


    • #3
      Thanks for your reply.
      You're right - I forgot to mention that I do want to save the state at the start of each replication. I've previously tried this but received an error:

      For example, this code:

      Code:
      clear all
      set more off
      set seed 123456
      
      capture program drop test
      program define test, rclass
      syntax [, nobs(integer 10)]
      
      preserve
      set obs `nobs'
      return scalar rng_state = c(rngstate)
      g x = rnormal()
      sum x, mean
      return scalar x_avg = r(mean)
      
      restore
      
      end
      
      simulate , reps(3): test, nobs(10)
      type mismatch
      an error occurred when simulate executed test
      r(109);


      When I set trace on, I observe this:
      Code:
          
      ------------------------------------------------------------------------- begin test ---
            - syntax [, nobs(integer 10)]
            - preserve
            - return scalar rng_state = c(rngstate)
      type mismatch
      --------------------------------------------------------------------------- end test ---

      Without saving the RNG state, the code does work to save the mean of the generated variable. When I thought it could be that the RNG state needed to be returned as a string, I tried this but still got the same error msg:
      Code:
      clear all
      set more off
      set seed 123456
      
      capture program drop test
      program define test, rclass
      syntax [, nobs(integer 10)]
      
      preserve
      set obs `nobs'
      return scalar rng_state = "`c(rngstate)'"
      g x = rnormal()
      sum x, mean
      return scalar x_avg = r(mean)
      
      restore
      
      end
      
      simulate reps(3): test, nobs(10)
      type mismatch
      an error occurred when simulate executed test
      r(109);
      Tracing the execution revealed this(I've shortened the RNG state for easier viewing):
      Code:
      ------------------------------------------------------------------------- begin test ---
            - syntax [, nobs(integer 10)]
            - preserve
            - set obs `nobs'
            = set obs 10
            - return scalar rng_state = "`c(rngstate)'"
            = return scalar rng_state = "XAA000000000001e240623a55849ed2854115472f70e95c2c42c05
      >03a011992859d6e363499dc00deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bad
      >e700ac49ffd6dc755a971838ce6630078571467f0e366002938e97b7608ad798218d43e064f4e9629087e06f
      >217dc19c30034e111de2ad2a4d533892c0eb121ac49802e9b07cac46a69eb147579c5bb073361ca98"
      type mismatch
      --------------------------------------------------------------------------- end test ---
      Although the output from this trace is not exactly the same as the output from the previous trace, both versions of the code result in a r(109) type mismatch error.

      In the program, I tried first saving the scalar as a string and then saving the result:
      Code:
      clear all
      set more off
      macro drop _all
      scalar drop _all
      set seed 123456
      
      capture program drop test
      program define test, rclass
      syntax [, nobs(integer 10)]
      
      
      preserve
      
      set obs `nobs'
      scalar rng = "`c(rngstate)'"
      return scalar rng = r(rng)
      
      g x = rnormal()
      sum x, mean
      return scalar x_avg = r(mean)
      
      restore
      
      end
       
      test, nobs(10)
      This is the output (the value of the scalar rng has been shortened) :
      Code:
      . return list
      
      scalars:
                    r(x_avg) =  .2557781100273132
                      r(rng) =  .
      
      . scalar list _all
             rng = XAA000000000001e240623a55849ed2854115472f70e95c2c42c0503a011992859d6e363499dc0
      0deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bade700ac49ffd6dc755a971838ce6
      630078571467f0e366002938e97b7608ad798218d43e064f4e9629087e06f217dc19c30034e111de2ad2a4d5338
      92c0eb121ac49802e9b07cac46a69eb147579c5bb073361ca98fa09263627207e7f0bdac04e3e3b1c09fb100e61
      b647834b306
      So the saved scalar r(rng) returned from the program is still missing even though the scalar rng is defined as the RNG state.

      I also tried this:
      Code:
      clear all
      set more off
      macro drop _all
      scalar drop _all
      set seed 123456
      
      capture program drop test
      program define test, rclass
      syntax [, nobs(integer 10)]
      
      
      preserve
      
      set obs `nobs'
      return local rng_state "`c(rngstate)'"
      scalar rng = "`c(rngstate)'"
      return scalar rng = r(rng)
      
      g x = rnormal()
      sum x, mean
      return scalar x_avg = r(mean)
      
      restore
      
      end
       
      
      test, nobs(10)
      return list
      scalar list _all
      simulate, reps(3): test, nobs(10)
      list
      And it gave me this output:
      Code:
      . return list
      
      scalars:
                    r(x_avg) =  .2557781100273132
                      r(rng) =  .
      
      macros:
                r(rng_state) : "XAA000000000001e240623a55849ed2854115472f70e95c2c42c0503a011992859d6e363499dc00deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bade.."
      
      . scalar list _all
             rng = XAA000000000001e240623a55849ed2854115472f70e95c2c42c0503a011992859d6e363499dc00deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bade700ac49ffd6dc755a971838ce6630078571467f0e366002938e97b7608ad798218d43e064f4e9629087e06f217dc19c30034e111de2ad2a4d533892c0eb121ac49802e9b07cac46a69eb147579c5bb073361ca98fa09263627207e7f0bdac04e3e3b1c09fb100e61b647834b306c2bed439173dd...
      
      . simulate, reps(3): test, nobs(10)
      
            command:  test, nobs(10)
              x_avg:  r(x_avg)
                rng:  r(rng)
      
      Simulations (3)
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
      .xx
      
      . list  
      
           +-----------------+
           |     x_avg   rng |
           |-----------------|
        1. |  .2557781     . |
        2. | -.2267316     . |
        3. |  .1837081     . |
           +-----------------+

      Now, we see that two scalars and one macro are returned, however the part of the code
      Code:
      return scalar rng = r(rng)
      still results in a missing value for the scalar r(rng) which results in missing values for the variable rng in a simulation. I was hoping that the macro r(rng_state) would be saved at each replication, but it wasn't.
      Last edited by Jessica Lum; 04 Jan 2018, 11:56.

      Comment


      • #4
        Well, you did some really nice investigation of the problem. The rng state is, indeed, a string. And -simulate- doesn't play with string expressions, only numeric expressions. My first thought is that you have to hack your own version of -simulate- by just looping over repetitions of test and storing things in a postfile. Postfile does accommodate strings. BUT, the c(rngstate) is a very long string, too long to fit into any str# variable, and -postfile- does not allow strL's. So it gets even harder. We have to, in effect, hack our own version of -postfile- as well. So, something like this:
        Code:
        clear all
        set more off
        macro drop _all
        scalar drop _all
        set seed 123456
        
        capture program drop test
        program define test, rclass
        syntax [, nobs(integer 10)]
        
            preserve
        
            set obs `nobs'
            return local rng `c(rngstate)'
        
            g x = rnormal() 
            sum x, mean
            return scalar x_avg = r(mean)
        
            restore
        
        end
         
        
        drop _all
        tempfile results
        save `results', emptyok
        
        local nreps 3
        forvalues i = 1/`nreps' {
            test
            return list
            local rng `r(rng)'
            local x_avg `r(x_avg)'
            drop _all
            set obs 1
            gen x_avg = `x_avg'
            gen strL rng = `"`rng'"'
            append using `results'
            save `"`results'"', replace
        }
        
        use `results', clear

        Comment


        • #5
          Thanks for your prompt reply.

          So I would not be using -simulate- and instead would be simulating results from looping my program for any value I set the local "nreps" to? I tried setting local nreps to 1000 to simulate 1000 replications to test it out and it gives me an error:
          Code:
          observation number out of range
              Observation number must be between 11 and 281,474,976,710,654.  (Observation numbers are typed without commas.)
          r(198);
          Would I be correct in assuming that any simple program would give me the same RNG state for each replication as long as the seed set initially were the same as the seed I set for my simulation of the same number of replications?

          In Stata's -help seed- help file, they state:
          When we do simulations at StataCorp, we record c(rngstate) for each replication. Just like everybody else, we record results from replications as observations in datasets; we just happen to have an extra variable in the dataset, namely, a string variable named state. That string is filled in observation by observation from the then-current values of c(rngstate), which is a function and so can be used in any context that a function can be used in Stata.
          I believe you're correct when stating that -simulate- doesn't save strings, but I'm still puzzled as to why they mentioned it in the help file.

          Comment


          • #6
            So, as for the error message, sorry, my error. You need -drop _all- or -clear- before -set obs 10- in program -test-.

            Would I be correct in assuming that any simple program would give me the same RNG state for each replication as long as the seed set initially were the same as the seed I set for my simulation of the same number of replications?
            From -help set rngstate-:

            If you record the seed you set, pseudorandom results such as results from a simulation or imputed values from mi impute can be reproduced later. Whatever you do after setting the seed, if you set the seed to the same value and repeat what you did, you will obtain the same results.
            That said, the randomization that Stata applies to incompletely identified sorts is under the control of a different random number generator, and to the extent that the unidentified part of the sort order of your data would affect what you are doing, you might not get the same results reproducibly. Also, "repeat what you did" includes not just the same code, but the same starting data.

            I'm not sure why you are asking this question. Bearing in mind the cautions about using the same data, and what might happen with under-specified sorts, there are two reasons for keeping track of the state of the RNG in simulations. One is just wanting to be able to reproduce all of the replications of the simulation. That requires only noting the original random number seed. Everything else proceeds reproducibly from there, and you don't need the RNG state at each replication. But there can be a second reason for doing this. Sometimes people run simulations with a very large number of replications, and then identify from the results particular replications that have interesting properties, which they then want to reproduce without having to run the entire simulation. That requires keeping track of the RNG state at the time of each replication, as we have been working on in this thread. It's up to you whether you need to do this or not.

            Comment


            • #7
              Sorry, I may have been vague when I said that I was puzzled. I didn't mean that I was unsure about the purpose of saving the RNG state. What I meant was that the help file seems to imply that saving the RNG state could be saved and recorded as any other scalar would be saved from replications of an rclass program as a "string variable named state" and that I'm confused about how they're actually saving the state as a string when they run their simulations.

              Comment


              • #8
                That's a good question. I hope somebody from StataCorp will respond and explain it.

                Comment


                • #9
                  Yes, I hope they'll clarify this. Thanks again for all your help!

                  Comment


                  • #10
                    Since the state for Stata's RNG is a string, it can be returned as a stored result in r() with
                    Code:
                    return local state = c(rngstate)
                    We cannot use
                    Code:
                    return scalar state = c(rngstate)
                    because the latter is meant to store numeric values.

                    If you previously saved the RNG state (as local or scalar), you can return it as a stored result in r() (from either the local or scalar variable) as follows:
                    Code:
                    program return_state, rclass
                      
                       set seed 12345
                       local  l = c(rngstate)
                       scalar s = c(rngstate)
                       
                       return local statel = "`l'"
                       return local states = s
                     
                    end
                    
                    return_state
                    return list
                    Please bear in mind that in Stata 14 we introduced the 64-bit Mersenne Twister RNG (MT64), and the state for MT64 is a long string of over 5000 characters. So that many bytes will be consumed every time the state is saved. Finally, in Stata 15, a stream RNG based on 64-bit Mersenne Twister (MT64S) was introduced to enable simulations with independent streams of random numbers.

                    I hope this is helpful.

                    -- Kreshna


                    Comment


                    • #11
                      How would I save these macros from each replication of a simulation so that I may have a variable at the end of the simulation representing each RNG state at each replication?

                      Comment


                      • #12
                        Kreshna Gopal (StataCorp) Yes, both Ms. Lum and I have recognized, if belatedly, that the c(rngstate) is a string, and, in current Stata, it has to be a strL because it is too long to be a str# variable. The question now is whether there is a simple way to store those strL's in a result data set being built up inside a loop over replications in a simulation. -simulate- does not accept strL's (or even str#'s) in its expression list. And -postfile- accepts str#'s, but not strL's. So it appears that one has to write fairly low level code here to iteratively build the results data set. Is there some easier way to do this that we are overlooking?

                        Comment


                        • #13
                          Clyde Schechter is right in that string results in every replication of simulate cannot be saved in the same way as numerical results are (with return scalar in rclass programs). Our documentation does not mean to imply that RNG states can be saved that way. Some coding will be needed.

                          Besides the solutions provided by Clyde Schechter, here is another one that uses postfile with simulate. The long string that is c(rngstate) can be split into 3 parts that postfile can handle (the length of the string for MT64 states is 5011 and postfile can handle up to str2045). The parts can be concatenated to reconstitute the RNG states at the end. Note that in our previous RNG (KISS32) (before Stata 14), the states were about 40 characters long. So using postfile to save the RNG states with simulate was simpler.

                          I hope I'm being helpful this time.

                          -- Kreshna

                          Code:
                          clear all
                          set more off
                          macro drop _all
                          scalar drop _all
                          set seed 123456
                          
                          capture program drop test
                          program define test, rclass
                          syntax [, nobs(integer 10)]
                             preserve
                             set obs `nobs'
                          
                             scalar state1 = substr(c(rngstate), 1,    2000)
                             scalar state2 = substr(c(rngstate), 2001, 4000)
                             scalar state3 = substr(c(rngstate), 4001, .)
                          
                             g x = rnormal()
                             sum x, mean
                             return scalar x_avg = r(mean)
                             restore
                          
                             post buffer (r(mean)) (state1) (state2) (state3)
                          end
                          
                          postfile buffer double x_avg str2000(state1 state2 state3) using results, replace
                          simulate, reps(3): test, nobs(10)
                          postclose buffer
                          
                          use results, clear
                          gen strL state = state1+state2+state3
                          drop state1 state2 state3

                          Comment


                          • #14
                            Excellent, thank you, Kreshna.

                            Comment


                            • #15
                              Thank you, Kreshna.

                              Comment

                              Working...
                              X