Seed value and sequence between 13 and 14

Chris Ferrall

Join Date: Sep 2015

Posts: 9
#1

Seed value and sequence between 13 and 14

21 Sep 2015, 19:31

Starting from the same seed it appears that a different sequence of random numbers is produced in Stata 14 than in Stata 13?
Can someone confirm this? If so, is there any way to restore equivalent seeds across versions (easily)?
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

21 Sep 2015, 19:49

I believe that help set seed run from Stata 14, or the set seed entry in the Stata Base Reference Manual PDF for Stata 14, addresses this issue.
Comment
Chris Ferrall

Join Date: Sep 2015

Posts: 9
#3

22 Sep 2015, 08:45

Originally posted by William Lisowski View Post

I believe that help set seed run from Stata 14, or the set seed entry in the Stata Base Reference Manual PDF for Stata 14, addresses this issue.

So to get the answer I have to pay for an upgrade to Stata 14? Could someone just tell me the answer?

I find it ironic that Stata spends pages and pages telling users that you set the seed you'll get the same results but doesn't say anything about this forward incompatibility publicly although people running old code will get different results.
Comment
Kreshna Gopal (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 43
#4

22 Sep 2015, 09:55

Starting from the same seed, the sequence of random numbers in Stata 14 is indeed different from that in Stata 13. This is because a new default random-number generator was introduced in Stata 14. It is the 64-bit Mersenne Twister. The default random-number generator in Stata 13 (and earlier versions) is the 32-bit KISS generator, which is still optionally available in Stata 14. Mersenne Twister has a longer period, better equidistribution properties, and higher resolution compared to the KISS generator.

In Stata 14, help set seed and help version provide information on how to reproduce results from old code. For example, you can add version 13 at the top of your do-file. This will not only ensure that the syntax is interpreted according to Stata 13 rules, but also make set seed and Stata's random-number functions use the KISS generator.

If you do not have Stata 14, all documentation of Stata 14 is available online at
http://www.stata.com/features/documentation
In particular, documentation on random-number generators and version control can be respectively found here:
http://www.stata.com/manuals14/rsetseed.pdf
http://www.stata.com/manuals14/pversion.pdf

I hope this helps.

-- Kreshna
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

22 Sep 2015, 10:15

Chris Ferrall

Starting from the same seed it appears that a different sequence of random numbers is produced in Stata 14 than in Stata 13?

Since you didn't tell us what version of Stata you were running, nor provide sample code and/or output (see the Statalist FAQ linked to from the top of the page, especially sections 9-12 on how to best pose your questions) I assumed, apparently in error, based on what you wrote in your initial post, that you had discovered the problem by running the same code on both versions of Stata, and thus could easily access the Stata 14 help and documentation.

You might also review section 5 of the Statalist FAQ.
Comment
Chris Ferrall

Join Date: Sep 2015

Posts: 9
#6

22 Sep 2015, 10:19

Thanks for answering the question. Two follow ups.

1. This change from 13 to 14 does not appear anywhere when I search for things like "set seed" or "random number generator" using Stata's search engine (either inside Stata 13 or in the forum or FAQs). Seems to me you should promote changes like this in some way so a search will answer the question.

2. Can you explain to me why the attached program does not produce the same random samples for the same seed in Stata 13 and 14? I set the version to 12.0 because some students are using Stata 12, some are using 14, and I have version 13. The macro $myseed is an integer computed from a student-specific code. So I thought I could produce the same subsample as each student, but I found out yesterday that I can't. And it appears that the version command, which I did include for exactly this reason, did not do what I thought it would do and does not appear to do what you say it does.

Postscript: I have read the version 14 manual about this. Apparently I am dumb, because as I read it setting the version to 12 before setting the seed should result in the old rng being used, but that does not seem to happen.
Attached Files

use452.ado (465 Bytes, 1 view)

Last edited by Chris Ferrall; 22 Sep 2015, 10:37. Reason: I read the documentation and am still confused.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

22 Sep 2015, 10:47

Have you read the Stata 14 documentation for the version command? It's a bit ambiguous, but it differentiates between what works interactively, in do-files, and in ado-files, and I don't think that what you chose will give you what you want. I don't have Stata 14, so I can't test.

Added in edit: Actually, when I finally read all the way to the bottom of the Stata 14 documentation for the version command, it is very explicit and clear that in ado files, something like

Code:

version 12.0, user

is required to cause random number generation to be done as it was under version 12.0.

Last edited by William Lisowski; 22 Sep 2015, 11:05.
1 like
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#8

22 Sep 2015, 10:59

hmm... I just tried "set rng kiss32" and ran the same seed for some random digits in Stata 14 as I set in Stata 13. Got the same series of numbers. So at least for me, it works. I can see how it would be confusing, and maybe there should have been some warning about the change in behavior, but it works across versions for me.

Hmm... even more. I closed out of Stata, re-opened it, and re-ran it with "version 13." Got the same results as when I'd done "set rng kiss32."

Last edited by ben earnhart; 22 Sep 2015, 11:05.
Comment
Chris Ferrall

Join Date: Sep 2015

Posts: 9
#9

22 Sep 2015, 11:55

Added in edit: Actually, when I finally read all the way to the bottom of the Stata 14 documentation for the version command, it is very explicit and clear that in ado files, something like

Code:

version 12.0, user

is required to cause random number generation to be done as it was under version 12.0.

You are right. But notice that the user option is NEW to Stata 14 and is only required for backward compatibility in ado files, but not do files! Gee, I wonder how I missed that the first time.

So Stata has not only changed random number generators (which is fine), but they changed how version works so that ado files written before Stata 14 will not produce the same results in 14 as earlier. Of course, if I had magically predicted that ", user" option would be required in the future (even though it does not show up at all in earlier documentation) my code would have produced identical results when run by someone in 14. But alas, Stata decided "version 12.0" would not be enough and broke my code.

I suspect others will be surprised that their bootstraps or simulations are not exactly the same when they upgrade to Stata 14, even if they had set the version. If they want the same results they will have to add ", user" to ado files.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#10

22 Sep 2015, 12:22

You have my sympathy for having to support Stata 14 users without having access to Stata 14 yourself.

Those who are surprised when they upgrade to Stata 14 will at least have easy access to the Stata 14 help and documentation to understand what has happened.
Comment
Chris Ferrall

Join Date: Sep 2015

Posts: 9
#11

22 Sep 2015, 12:39

Originally posted by William Lisowski View Post

You have my sympathy for having to support Stata 14 users without having access to Stata 14 yourself.

Those who are surprised when they upgrade to Stata 14 will at least have easy access to the Stata 14 help and documentation to understand what has happened.

Well, I planned an interactive session of Stata with 40 undergraduates and tested all my code and its use of the internet inside Stata, Unfortunately only using Stata 13 for Windows ... because I trusted my 25+ years of using Stata that it would be stable and reliable. Turns out there was a bug in Stata for Mac that I could not anticipate, and my answer key for each student was wrong because I could not anticipate setting the seed would not produced identical results in 14. So in terms of a smooth running session to inspire confidence in me and Stata it was a disaster.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#12

22 Sep 2015, 15:28

I was also surprised by the change in version control. I guess ultimately, StataCorp decided to strike a balance between the need not to break old user-written programs and yet make the new RNG available to them. If you write a do-file that is expected to be run on version 12, 13, and 14, it should include a version 12 at the beginning. If I understand correctly the new paradigm, your program should work as expected if the do-file includes a version 12 statement, even when run in Stata 14. For example, the following generates the same series in all 4 calls in Stata 13. In Stata 14, the first 2 series differ because the new RNG is used. The ones following the version 12 call are identical to those generated in Stata 13.

Code:

cap program drop draw program draw version 12 args myseed set seed `myseed' drop _all set obs 5 gen x = runiform() list end draw 2341234 draw 2341234 version 12 draw 2341234 draw 2341234

I also noticed in your ado file that you use the following construct to draw a random sample from a dataset:

Code:

use if runiform() < `subsample'/100 using ${dropbox}/`name', clear

I know that this is much faster than using the built-in sample command but it can't select an exact observation count. I wrote randomtag (from SSC) because I wanted a fast way to randomly select a set number of observations to list for my listsome program (also on SSC). randomtag is very fast and guaranteed to generate the exact same sample that the built-in sample command would. There's some extra overhead in having to load the complete dataset in memory and then dropping obs but it may suit your needs anyway. Here's an example that compares both approach:

Code:

* create a large dataset clear set obs 50000000 gen id = _n gen double x1 = runiform() gen double x2 = runiform() tempfile f save "`f'" * choose a 1% sample using both methods timer clear timer on 1 forvalues i = 1/3 { use if runiform() < .01 using "`f'" count } timer off 1 timer on 2 forvalues i = 1/3 { use "`f'", clear local n = int(_N / 100) randomtag , count(`n') gen(t) keep if t count } timer off 2 timer list * show that -randomtag- picks exactly the same sample as the built-in command timer clear timer on 1 set seed 41234123 use "`f'", clear sample 1000, count sort id timer off 1 tempfile s save "`s'" timer on 2 set seed 41234123 use "`f'", clear randomtag , count(1000) gen(t) keep if t drop t timer off 2 timer list cf _all using "`s'", all
Comment
Chris Ferrall

Join Date: Sep 2015

Posts: 9
#13

23 Sep 2015, 11:54

[QUOTE=Robert Picard;n1310757]your program should work as expected if the do-file includes a version 12 statement, even when run in Stata 14. For example, the following generates the same series in all 4 calls in Stata 13. In Stata 14, the first 2 series differ because the new RNG is used. The ones following the version 12 call are identical to those generated in Stata 13.

Yes, but they ALSO made it so that ado files would NOT produce the same results, because they require plain old version 12.0 is not enough in ado files but is in do files. How does that makes sense?
Instead, to work in 14 I have to add the , user option to ado files. Fortunately adding that option does not break version in earlier versions.
I understand how to make it work for all versions now, but there was no way to anticipate this before having a student with 14 get a different result.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#14

23 Sep 2015, 12:31

There may be other changes under the hood that would make results differ for Stata 14 users so it is not sufficient that your ado set the version. It is the responsibility of Stata 14 users to start their do-file with

Code:

version 12

if it is important to you that their results match those generated under older versions of Stata. If they do that, there is absolutely no reason to change your ado file. It's not possible to predict what will change in a new version of Stata but it's very predictable that a do-file's results will change if it does not start with a proper version command.
3 likes
Comment
Kreshna Gopal (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 43
#15

23 Sep 2015, 17:09

Robert Picard is spot on about our endeavour to strike a balance in Stata 14: adopt a better random-number generator (RNG) and at the same time enable old, RNG-dependent programs to produce the same results as before.

We understand the changes we made can be confusing. Chris Ferrall is probably right in pointing out that these changes should appear more prominently in our documentation. We hope this post sheds some light on our design choices.

As mentioned, 64-bit Mersenne Twister (MT64) replaced 32-bit KISS (KISS32) as the default RNG in Stata 14. MT64 is widely acknowledged as one of the best RNGs available, especially in terms of equidistribution and period length.

Now, consider a Stata command like mi. Should mi, which was originally written in Stata 11 (the ado program for mi has version 11.0 at the top), use the default RNG in Stata 14 (MT64), or should it use the default RNG in Stata 11 (KISS32)? The same issue arises with any user-written command written before Stata 14. We, of course, wanted all existing Stata commands (like mi, bootstrap and many others) to use the new generator. The same applies to user-written programs.

However, to enable existing Stata commands and user-written programs to use the new RNG, version control in Stata had to be adapted. Also, existing code would inevitably produce different results with the new RNG -- but we provided several new features to easily revert back to the old RNG, if necessary.

Let us first explain the change in version control in Stata 14, and then discuss the methods we introduced to choose the RNG.

Version control in Stata 14

It is important to note that version control is intended for backward compatibility, not forward compatibility. That is, version control, and the documentation for it, is intended such that users of a newer version of Stata (such as Stata 14) can run their code using the same syntax they used in earlier versions of Stata (such as 13).

Changing Stata's default RNG produced a unique situation. We wanted ado programs (user-written or in official commands like mi) written under version < 14 to use Stata 14's RNG. (Note: by ado programs we mean programs in ado-files as well as programs defined by program define in do-files).

The version command was therefore modified so that it does not apply to the RNG in ado programs. Instead, version applies to the RNG only when set interactively or when used in a do-file, which is how we believe most users set up their reproducible analyses: they write a do-file, and we recommend they put version # at the top of that do-file, recording the version of the Stata they are using. Then, when they use a future version of Stata with that same do-file, version # controls both syntax differences and the RNG used by that future version of Stata. Think of version, interactively or in a do-file, as setting two versions -- the "syntax" version, and the "user" version. The "user" version is the version of Stata the users were using when they called Stata interactively or wrote their do-file.

The "syntax" version is respected by all of Stata -- interactive Stata, do-files, ado-files, and internal commands. The "user" version is respected by the RNG right now, and possibly by other functions in the future.

When version is used in an ado-file, only the "syntax" version is set, not the "user" version.

We also introduced the user option in version that does the opposite. version 12, user in Stata 14 does not set the "syntax" version and only sets the "user" version. Thus, version 12, user will run everything under Stata 14, except the RNG: for RNG functions and commands (like runiform() and set seed), the default RNG of version 12 is used.

How to choose the RNG in Stata 14?

You may want to specify the RNG to reproduce some results, or other purposes. There are four ways to choose the RNG in Stata 14 onwards. The best approach will depend on needs and context.

1. We believe most users should not explicitly set the RNG and instead should allow version control to choose which RNG will be used. If you are working with a do-file from Stata 13 or earlier, or if you are typing commands interactively that you used with Stata 13 or earlier, and wish Stata 14 to use Stata 13's default RNG, simply type version 13.1 (or an earlier number) interactively or place this command at the top of your do-file.

Coming back to the mi example, how might a user call it? They might call it interactively. They might call it from a do-file. They might call it from another ado-file. To call it from an ado-file, however, at some level the user had to execute either an interactive command or a do-file.

We decided that in Stata 14, the default behavior in ado programs should be to use the default RNG unless the user had changed the version interactively or in a do-file to something less than 14. Otherwise, any user-written or official command with, for example, version 12 at the top would be using KISS32 rather than the newer, better, MT64.

2. We can also choose the RNG with the setcommand. set rng kiss32 or set rng mt64 sets the current RNG. That setting will stay in effect for the duration of the Stata session if it used interactively, or it will stay in effect until the do-file or ado-file in which it is used exits.

3. For finer control, users can even use functions (and settings) specifically targeted to a given RNG. For example, set seed_kiss32 ..., runiform_kiss32(), rnormal_mt64(), and so on. In fact, all RNG functions, seeds and states have a default (like seed), and a variant for each generator (seed_kiss32 and seed_mt64). Currently we have mt64 and kiss32, but we may have more in the future.

4. Use the user option in version (discussed above). This is the most rare usage.

For reproducible analyses, we think the best practice is to put the commands needed for a given analysis into a do-file, which in turn might call other do-files as well as user-written and official ado-files and built-in commands. The top-level do-file should contain a version statement at the top. This version statement controls how syntax is evaluated and also controls which RNG is used by everything called by that top-level do-file. Users should not have to add , user to the version statement in ado-files unless they call those ado-files interactively, outside the context of a controlling do-file where the version should be set.

-- Kreshna
3 likes
Comment

Announcement

Seed value and sequence between 13 and 14

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment