Setting random seed not enough?

Brendan Halpin

Join Date: Mar 2014
Posts: 152

Setting random seed not enough?

07 Jan 2018, 05:03

I'm writing a program which uses Mata to consider large numbers of random permutations of a matrix, to search for an optimal arrangement. In testing I'm using "set seed" and "mata: rseed()" to get consistent results. From one Stata session to another this works, but if I run the program twice in the same session (with a "clear all" between runs) the random numbers eventually (but not immediately) become inconsistent.

Should I need more than setting the seed plus "clear all" to restore the state of the Stata session to the beginning, at least in terms of PRNGs?

(Stata 14.2, on two separate machines).

Example results (by iteration 3, at least one of the 2000 random permutations differs):

Code:

: pop = ga(dd, 2000)
  Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
  Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
  Iter 3  : Σ dist: min 950.5679; mean: 1019.234; max: 1039.178
[...]
: pop = ga(dd, 2000)
  Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
  Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
  Iter 3  : Σ dist: min 950.5679; mean: 1019.247; max: 1039.178
[...]

Code to run the main code twice:

Code:

do test.do
clear all
clear mata
do test.do

Main code (test.do):

Code:

// Clear all and set seeds
clear all
set seed 123456
mata: rseed(123456)

// Generate random data and make a dissimilarity matrix
set obs 50
forvalues x = 1/10 {
  gen x`x'=rnormal()
}
matrix dissim dd = x1-x10, L2squared

mata

real scalar fitness (real matrix distmat, real matrix permdat) {
  width = cols(distmat)
  perm = transposeonly(order(transposeonly(permdat),1))
  tempdm = distmat[perm,perm]

  // Sum minor diagonal (offset 0 gives main diagonal)
  offset = 1
  f = sum(diagonal(tempdm[1..(width-offset),(1+offset)..width]))

  return(f)
}

// Genetic algorithm function: look for a matrix permutation that minimises the fitness function
real matrix function ga (real matrix distmat, real scalar npop) {
  width = rows(distmat)
 
  npop = 2*floor(npop/2) // even
  nparents = floor(npop/2) // change from 50% to intensify selection
  nsurv = nparents
  nnew = floor(npop*0.20)
  ntoconv = ceil(npop*0.1)
  // Use top 50% as core. They procreate to replace 30%, and 20% is new random
  // Progress data is on top 50%

  // Create a random population, each row a permutation
  population = rnormal(npop,width,0,1)
  sigmadist = J(npop,1,.)

  // Main iteration
  iter = 0
  while (iter==0 |(max(sigmadist[1..ntoconv]) != min(sigmadist[1..ntoconv]))) {
    newpop = population
    iter++

    // Calculate fitness (lower is better)
    for (i=1; i<=npop; i++) {
      sigmadist[i] = fitness(distmat, newpop[i,.])
    }
    
    // Order the new medoid sets per fitness (best (lowest) first)
    newpop = population[order(sigmadist,1),][1..nparents,]
    sigmadist = sigmadist[order(sigmadist,1)]
    "Iter "+strofreal(iter,"%-3.0f")+": Σ dist: min "+strofreal(min(sigmadist[1..ntoconv]))+"; mean: "+strofreal(mean(sigmadist[1..ntoconv]))+"; max: "+strofreal(max(sigmadist[1..ntoconv]))
    displayflush()
 
    // Keep top 50%
    population[1..nsurv,] = newpop[1..nsurv,]

    // Crossover section
    // Top 50% used for crossover to create 30%, remaining 20% new
    for (i=1+nsurv; i<=npop-nnew; i++) {
      j = 1+floor((1-runiform(1,1)^0.1)*nparents)
      k = 1+floor((1-runiform(1,1)^0.1)*nparents)
      l = runiformint(1,1,1,width-1)
      population[i,1..l] = newpop[j,1..l]
      population[i,l+1..width] = newpop[k,l+1..width]
    }
    population[(npop-nnew+1)..npop,] = rnormal(nnew, width, 0, 1)

  }
  return(population)
}

end

mata
dd = st_matrix("dd")
pop = ga(dd, 2000)
end

Tags: None

Brendan Halpin

Join Date: Mar 2014

Posts: 152
#2

07 Jan 2018, 05:52

Bloody hell, I've asked exactly this question before (and largely answered it for myself):

https://www.stata.com/statalist/arch.../msg00045.html

It's related to sort, with ties. My code uses Mata's order() function on data with ties. Apparently despite setting seeds and doing "clear all", treatment of ties depends in some way on the state of Stata.

You learn something every day. The trouble is I learnt all this four years ago and forgot it!

(I would like to be able to reset the state of the program 100%, thought, without re-starting.)
Brendan
2 likes
Comment
Brendan Halpin

Join Date: Mar 2014

Posts: 152
#3

07 Jan 2018, 06:33

And this, from Vince Wiggins, gives the canonical answer:

https://www.stata.com/statalist/arch.../msg00101.html

Code:

: stata("set sortseed 12345")
1 like
Comment

Announcement

Setting random seed not enough?

Comment

Comment