saving the rngstate of each replication in a monte carlo simulation

Jessica Lum

Join Date: Dec 2015

Posts: 33
#1

saving the rngstate of each replication in a monte carlo simulation

03 Jan 2018, 12:44

Hello,

Using Stata 14, I am simulating an r-class program and would like to save the rngstate of each replication of the simulation. Any help would be greatly appreciated!

Jessica
Tags: monte carlo simulation, random number generator, rngstate, saved results, simulation
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#2

03 Jan 2018, 17:11

You don't say whether you want to save the rng state as of the start of each replication or as of the finish. I'll assume you want the start. Then you can just put:

Code:

return scalar rng_state = c(rngstate)

in the beginning of your rclass program's code (before you actually generate any random numbers.

That will put the state of the RNG into the list of things your program returns, and then you can record it in your ongoing results with the other things your program returns.
Comment

Jessica Lum

Join Date: Dec 2015
Posts: 33

04 Jan 2018, 11:44

Thanks for your reply.
You're right - I forgot to mention that I do want to save the state at the start of each replication. I've previously tried this but received an error:

For example, this code:

Code:

clear all
set more off
set seed 123456

capture program drop test
program define test, rclass
syntax [, nobs(integer 10)]

preserve
set obs `nobs'
return scalar rng_state = c(rngstate)
g x = rnormal()
sum x, mean
return scalar x_avg = r(mean)

restore

end

simulate , reps(3): test, nobs(10)
type mismatch
an error occurred when simulate executed test
r(109);

When I set trace on, I observe this:

Code:

    
------------------------------------------------------------------------- begin test ---
      - syntax [, nobs(integer 10)]
      - preserve
      - return scalar rng_state = c(rngstate)
type mismatch
--------------------------------------------------------------------------- end test ---

Without saving the RNG state, the code does work to save the mean of the generated variable. When I thought it could be that the RNG state needed to be returned as a string, I tried this but still got the same error msg:

Code:

clear all
set more off
set seed 123456

capture program drop test
program define test, rclass
syntax [, nobs(integer 10)]

preserve
set obs `nobs'
return scalar rng_state = "`c(rngstate)'"
g x = rnormal()
sum x, mean
return scalar x_avg = r(mean)

restore

end

simulate reps(3): test, nobs(10)
type mismatch
an error occurred when simulate executed test
r(109);

Tracing the execution revealed this(I've shortened the RNG state for easier viewing):

Code:

------------------------------------------------------------------------- begin test ---
      - syntax [, nobs(integer 10)]
      - preserve
      - set obs `nobs'
      = set obs 10
      - return scalar rng_state = "`c(rngstate)'"
      = return scalar rng_state = "XAA000000000001e240623a55849ed2854115472f70e95c2c42c05
>03a011992859d6e363499dc00deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bad
>e700ac49ffd6dc755a971838ce6630078571467f0e366002938e97b7608ad798218d43e064f4e9629087e06f
>217dc19c30034e111de2ad2a4d533892c0eb121ac49802e9b07cac46a69eb147579c5bb073361ca98"
type mismatch
--------------------------------------------------------------------------- end test ---

Although the output from this trace is not exactly the same as the output from the previous trace, both versions of the code result in a r(109) type mismatch error.

In the program, I tried first saving the scalar as a string and then saving the result:

Code:

clear all
set more off
macro drop _all
scalar drop _all
set seed 123456

capture program drop test
program define test, rclass
syntax [, nobs(integer 10)]


preserve

set obs `nobs'
scalar rng = "`c(rngstate)'"
return scalar rng = r(rng)

g x = rnormal()
sum x, mean
return scalar x_avg = r(mean)

restore

end
 
test, nobs(10)

This is the output (the value of the scalar rng has been shortened) :

Code:

. return list

scalars:
              r(x_avg) =  .2557781100273132
                r(rng) =  .

. scalar list _all
       rng = XAA000000000001e240623a55849ed2854115472f70e95c2c42c0503a011992859d6e363499dc0
0deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bade700ac49ffd6dc755a971838ce6
630078571467f0e366002938e97b7608ad798218d43e064f4e9629087e06f217dc19c30034e111de2ad2a4d5338
92c0eb121ac49802e9b07cac46a69eb147579c5bb073361ca98fa09263627207e7f0bdac04e3e3b1c09fb100e61
b647834b306

So the saved scalar r(rng) returned from the program is still missing even though the scalar rng is defined as the RNG state.

I also tried this:

Code:

clear all
set more off
macro drop _all
scalar drop _all
set seed 123456

capture program drop test
program define test, rclass
syntax [, nobs(integer 10)]


preserve

set obs `nobs'
return local rng_state "`c(rngstate)'"
scalar rng = "`c(rngstate)'"
return scalar rng = r(rng)

g x = rnormal()
sum x, mean
return scalar x_avg = r(mean)

restore

end
 

test, nobs(10)
return list
scalar list _all
simulate, reps(3): test, nobs(10)
list

And it gave me this output:

Code:

. return list

scalars:
              r(x_avg) =  .2557781100273132
                r(rng) =  .

macros:
          r(rng_state) : "XAA000000000001e240623a55849ed2854115472f70e95c2c42c0503a011992859d6e363499dc00deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bade.."

. scalar list _all
       rng = XAA000000000001e240623a55849ed2854115472f70e95c2c42c0503a011992859d6e363499dc00deca553887850ad4deb4e0c5fc96dd3ff0d771756541b81a814b9913aeb3bade700ac49ffd6dc755a971838ce6630078571467f0e366002938e97b7608ad798218d43e064f4e9629087e06f217dc19c30034e111de2ad2a4d533892c0eb121ac49802e9b07cac46a69eb147579c5bb073361ca98fa09263627207e7f0bdac04e3e3b1c09fb100e61b647834b306c2bed439173dd...

. simulate, reps(3): test, nobs(10)

      command:  test, nobs(10)
        x_avg:  r(x_avg)
          rng:  r(rng)

Simulations (3)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.xx

. list  

     +-----------------+
     |     x_avg   rng |
     |-----------------|
  1. |  .2557781     . |
  2. | -.2267316     . |
  3. |  .1837081     . |
     +-----------------+

Now, we see that two scalars and one macro are returned, however the part of the code

Code:

return scalar rng = r(rng)

still results in a missing value for the scalar r(rng) which results in missing values for the variable rng in a simulation. I was hoping that the macro r(rng_state) would be saved at each replication, but it wasn't.

Last edited by Jessica Lum; 04 Jan 2018, 11:56.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#4

04 Jan 2018, 12:38

Well, you did some really nice investigation of the problem. The rng state is, indeed, a string. And -simulate- doesn't play with string expressions, only numeric expressions. My first thought is that you have to hack your own version of -simulate- by just looping over repetitions of test and storing things in a postfile. Postfile does accommodate strings. BUT, the c(rngstate) is a very long string, too long to fit into any str# variable, and -postfile- does not allow strL's. So it gets even harder. We have to, in effect, hack our own version of -postfile- as well. So, something like this:

Code:

clear all set more off macro drop _all scalar drop _all set seed 123456 capture program drop test program define test, rclass syntax [, nobs(integer 10)] preserve set obs `nobs' return local rng `c(rngstate)' g x = rnormal() sum x, mean return scalar x_avg = r(mean) restore end drop _all tempfile results save `results', emptyok local nreps 3 forvalues i = 1/`nreps' { test return list local rng `r(rng)' local x_avg `r(x_avg)' drop _all set obs 1 gen x_avg = `x_avg' gen strL rng = `"`rng'"' append using `results' save `"`results'"', replace } use `results', clear
Comment
Jessica Lum

Join Date: Dec 2015

Posts: 33
#5

04 Jan 2018, 13:35

Thanks for your prompt reply.

So I would not be using -simulate- and instead would be simulating results from looping my program for any value I set the local "nreps" to? I tried setting local nreps to 1000 to simulate 1000 replications to test it out and it gives me an error:

Code:

observation number out of range Observation number must be between 11 and 281,474,976,710,654. (Observation numbers are typed without commas.) r(198);

Would I be correct in assuming that any simple program would give me the same RNG state for each replication as long as the seed set initially were the same as the seed I set for my simulation of the same number of replications?

In Stata's -help seed- help file, they state:

When we do simulations at StataCorp, we record c(rngstate) for each replication. Just like everybody else, we record results from replications as observations in datasets; we just happen to have an extra variable in the dataset, namely, a string variable named state. That string is filled in observation by observation from the then-current values of c(rngstate), which is a function and so can be used in any context that a function can be used in Stata.

I believe you're correct when stating that -simulate- doesn't save strings, but I'm still puzzled as to why they mentioned it in the help file.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#6

04 Jan 2018, 13:53

So, as for the error message, sorry, my error. You need -drop _all- or -clear- before -set obs 10- in program -test-.

Would I be correct in assuming that any simple program would give me the same RNG state for each replication as long as the seed set initially were the same as the seed I set for my simulation of the same number of replications?

From -help set rngstate-:

If you record the seed you set, pseudorandom results such as results from a simulation or imputed values from mi impute can be reproduced later. Whatever you do after setting the seed, if you set the seed to the same value and repeat what you did, you will obtain the same results.

That said, the randomization that Stata applies to incompletely identified sorts is under the control of a different random number generator, and to the extent that the unidentified part of the sort order of your data would affect what you are doing, you might not get the same results reproducibly. Also, "repeat what you did" includes not just the same code, but the same starting data.

I'm not sure why you are asking this question. Bearing in mind the cautions about using the same data, and what might happen with under-specified sorts, there are two reasons for keeping track of the state of the RNG in simulations. One is just wanting to be able to reproduce all of the replications of the simulation. That requires only noting the original random number seed. Everything else proceeds reproducibly from there, and you don't need the RNG state at each replication. But there can be a second reason for doing this. Sometimes people run simulations with a very large number of replications, and then identify from the results particular replications that have interesting properties, which they then want to reproduce without having to run the entire simulation. That requires keeping track of the RNG state at the time of each replication, as we have been working on in this thread. It's up to you whether you need to do this or not.
Comment
Jessica Lum

Join Date: Dec 2015

Posts: 33
#7

04 Jan 2018, 14:06

Sorry, I may have been vague when I said that I was puzzled. I didn't mean that I was unsure about the purpose of saving the RNG state. What I meant was that the help file seems to imply that saving the RNG state could be saved and recorded as any other scalar would be saved from replications of an rclass program as a "string variable named state" and that I'm confused about how they're actually saving the state as a string when they run their simulations.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#8

04 Jan 2018, 14:09

That's a good question. I hope somebody from StataCorp will respond and explain it.
Comment
Jessica Lum

Join Date: Dec 2015

Posts: 33
#9

04 Jan 2018, 14:16

Yes, I hope they'll clarify this. Thanks again for all your help!
Comment
Kreshna Gopal (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 43
#10

05 Jan 2018, 10:08

Since the state for Stata's RNG is a string, it can be returned as a stored result in r() with

Code:

return local state = c(rngstate)

We cannot use

Code:

return scalar state = c(rngstate)

because the latter is meant to store numeric values.

If you previously saved the RNG state (as local or scalar), you can return it as a stored result in r() (from either the local or scalar variable) as follows:

Code:

program return_state, rclass set seed 12345 local l = c(rngstate) scalar s = c(rngstate) return local statel = "`l'" return local states = s end return_state return list

Please bear in mind that in Stata 14 we introduced the 64-bit Mersenne Twister RNG (MT64), and the state for MT64 is a long string of over 5000 characters. So that many bytes will be consumed every time the state is saved. Finally, in Stata 15, a stream RNG based on 64-bit Mersenne Twister (MT64S) was introduced to enable simulations with independent streams of random numbers.

I hope this is helpful.

-- Kreshna
Comment
Jessica Lum

Join Date: Dec 2015

Posts: 33
#11

05 Jan 2018, 10:50

How would I save these macros from each replication of a simulation so that I may have a variable at the end of the simulation representing each RNG state at each replication?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#12

05 Jan 2018, 11:20

Kreshna Gopal (StataCorp) Yes, both Ms. Lum and I have recognized, if belatedly, that the c(rngstate) is a string, and, in current Stata, it has to be a strL because it is too long to be a str# variable. The question now is whether there is a simple way to store those strL's in a result data set being built up inside a loop over replications in a simulation. -simulate- does not accept strL's (or even str#'s) in its expression list. And -postfile- accepts str#'s, but not strL's. So it appears that one has to write fairly low level code here to iteratively build the results data set. Is there some easier way to do this that we are overlooking?
Comment
Kreshna Gopal (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 43
#13

07 Jan 2018, 20:47

Clyde Schechter is right in that string results in every replication of simulate cannot be saved in the same way as numerical results are (with return scalar in rclass programs). Our documentation does not mean to imply that RNG states can be saved that way. Some coding will be needed.

Besides the solutions provided by Clyde Schechter, here is another one that uses postfile with simulate. The long string that is c(rngstate) can be split into 3 parts that postfile can handle (the length of the string for MT64 states is 5011 and postfile can handle up to str2045). The parts can be concatenated to reconstitute the RNG states at the end. Note that in our previous RNG (KISS32) (before Stata 14), the states were about 40 characters long. So using postfile to save the RNG states with simulate was simpler.

I hope I'm being helpful this time.

-- Kreshna

Code:

clear all set more off macro drop _all scalar drop _all set seed 123456 capture program drop test program define test, rclass syntax [, nobs(integer 10)] preserve set obs `nobs' scalar state1 = substr(c(rngstate), 1, 2000) scalar state2 = substr(c(rngstate), 2001, 4000) scalar state3 = substr(c(rngstate), 4001, .) g x = rnormal() sum x, mean return scalar x_avg = r(mean) restore post buffer (r(mean)) (state1) (state2) (state3) end postfile buffer double x_avg str2000(state1 state2 state3) using results, replace simulate, reps(3): test, nobs(10) postclose buffer use results, clear gen strL state = state1+state2+state3 drop state1 state2 state3
2 likes
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30153
#14

07 Jan 2018, 20:48

Excellent, thank you, Kreshna.
Comment
Jessica Lum

Join Date: Dec 2015

Posts: 33
#15

08 Jan 2018, 07:54

Thank you, Kreshna.
Comment

Announcement