Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Setting random seed not enough?

    I'm writing a program which uses Mata to consider large numbers of random permutations of a matrix, to search for an optimal arrangement. In testing I'm using "set seed" and "mata: rseed()" to get consistent results. From one Stata session to another this works, but if I run the program twice in the same session (with a "clear all" between runs) the random numbers eventually (but not immediately) become inconsistent.

    Should I need more than setting the seed plus "clear all" to restore the state of the Stata session to the beginning, at least in terms of PRNGs?

    (Stata 14.2, on two separate machines).

    Example results (by iteration 3, at least one of the 2000 random permutations differs):
    Code:
    : pop = ga(dd, 2000)
      Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
      Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
      Iter 3  : Σ dist: min 950.5679; mean: 1019.234; max: 1039.178
    [...]
    : pop = ga(dd, 2000)
      Iter 1  : Σ dist: min 977.377; mean: 1067.934; max: 1090.905
      Iter 2  : Σ dist: min 950.5679; mean: 1039.75; max: 1064.551
      Iter 3  : Σ dist: min 950.5679; mean: 1019.247; max: 1039.178
    [...]
    Code to run the main code twice:
    Code:
    do test.do
    clear all
    clear mata
    do test.do
    Main code (test.do):
    Code:
    // Clear all and set seeds
    clear all
    set seed 123456
    mata: rseed(123456)
    
    // Generate random data and make a dissimilarity matrix
    set obs 50
    forvalues x = 1/10 {
      gen x`x'=rnormal()
    }
    matrix dissim dd = x1-x10, L2squared
    
    mata
    
    real scalar fitness (real matrix distmat, real matrix permdat) {
      width = cols(distmat)
      perm = transposeonly(order(transposeonly(permdat),1))
      tempdm = distmat[perm,perm]
    
      // Sum minor diagonal (offset 0 gives main diagonal)
      offset = 1
      f = sum(diagonal(tempdm[1..(width-offset),(1+offset)..width]))
    
      return(f)
    }
    
    // Genetic algorithm function: look for a matrix permutation that minimises the fitness function
    real matrix function ga (real matrix distmat, real scalar npop) {
      width = rows(distmat)
     
      npop = 2*floor(npop/2) // even
      nparents = floor(npop/2) // change from 50% to intensify selection
      nsurv = nparents
      nnew = floor(npop*0.20)
      ntoconv = ceil(npop*0.1)
      // Use top 50% as core. They procreate to replace 30%, and 20% is new random
      // Progress data is on top 50%
    
      // Create a random population, each row a permutation
      population = rnormal(npop,width,0,1)
      sigmadist = J(npop,1,.)
    
      // Main iteration
      iter = 0
      while (iter==0 |(max(sigmadist[1..ntoconv]) != min(sigmadist[1..ntoconv]))) {
        newpop = population
        iter++
    
        // Calculate fitness (lower is better)
        for (i=1; i<=npop; i++) {
          sigmadist[i] = fitness(distmat, newpop[i,.])
        }
        
        // Order the new medoid sets per fitness (best (lowest) first)
        newpop = population[order(sigmadist,1),][1..nparents,]
        sigmadist = sigmadist[order(sigmadist,1)]
        "Iter "+strofreal(iter,"%-3.0f")+": Σ dist: min "+strofreal(min(sigmadist[1..ntoconv]))+"; mean: "+strofreal(mean(sigmadist[1..ntoconv]))+"; max: "+strofreal(max(sigmadist[1..ntoconv]))
        displayflush()
     
        // Keep top 50%
        population[1..nsurv,] = newpop[1..nsurv,]
    
        // Crossover section
        // Top 50% used for crossover to create 30%, remaining 20% new
        for (i=1+nsurv; i<=npop-nnew; i++) {
          j = 1+floor((1-runiform(1,1)^0.1)*nparents)
          k = 1+floor((1-runiform(1,1)^0.1)*nparents)
          l = runiformint(1,1,1,width-1)
          population[i,1..l] = newpop[j,1..l]
          population[i,l+1..width] = newpop[k,l+1..width]
        }
        population[(npop-nnew+1)..npop,] = rnormal(nnew, width, 0, 1)
    
      }
      return(population)
    }
    
    end
    
    mata
    dd = st_matrix("dd")
    pop = ga(dd, 2000)
    end

  • #2
    Bloody hell, I've asked exactly this question before (and largely answered it for myself):

    https://www.stata.com/statalist/arch.../msg00045.html

    It's related to sort, with ties. My code uses Mata's order() function on data with ties. Apparently despite setting seeds and doing "clear all", treatment of ties depends in some way on the state of Stata.

    You learn something every day. The trouble is I learnt all this four years ago and forgot it!

    (I would like to be able to reset the state of the program 100%, thought, without re-starting.)
    Brendan

    Comment


    • #3
      And this, from Vince Wiggins, gives the canonical answer:

      https://www.stata.com/statalist/arch.../msg00101.html
      Code:
      : stata("set sortseed 12345")

      Comment

      Working...
      X