Repeated seed selection (problem recreating results with same seed)

Jonathan Karver

Join Date: Nov 2014

Posts: 11
#1

Repeated seed selection (problem recreating results with same seed)

11 Jun 2015, 13:00

Forgive me if this has been discussed elsewhere on Statalist, but I cannot find anything that specifically addresses the problem I am having. I have read through the manual on seed setting, and am still not convinced that my problem can be resolved with its tips (I might be wrong, of course).

Similar to re-randomization programs, I wrote code to run a series of commands to randomly assign a group number to every observation in my sample and I am trying to minimize the number of problems (total number of groups with an unbalanced number of males, for example). To do this, I define 10,000 unique seeds, and thus 10,000 unique variations of a random variable (u=runiform()), and save the "number of problems" in the dataset, along with the seed, for each draw. The point of doing this is to stay with the seed that minimizes the number of problems (optimize my randomization, in a way). There is nothing underlying how I define this problem that could cause an error (the code is to assign randomly to one of X groups within a school, and count the number of groups in the dataset that have a substantially different proportion of males than the school itself. A truly random assignment of groups within each school would mean that the proportion of males within each group would be no different than the proportion of males in the school)

The problem is, once I have defined this optimal seed as a macro, I am unable to replicate the results. That is, my `problem' is not the same after defining the optimal seed out of 10,000 draws even when I use the optimal seed. Here is the snippet of my code that is necessary to understand my problem (all code is run in version 13.1). I have been running with 100 and 1,000 unique seeds since 10,000 takes forever, but the result is obviously the same:

Code:

use `assignment', clear ** Select seed that minimizes problems //script 1: loads random seeds to memory capture program drop load_random_seeds program define load_random_seeds { preserve drop _all set seed 15485863 set obs 10000 capture drop u gen double u = runiform() gen randomseeds = round(u*10000000) //creates 1,000 (10,000) unique seeds based on u mkmat randomseeds, mat(S) //matrix S now has all seed values restore } end //Selection of seed that minimizes problems in control schools tempname memhold postfile `memhold' seed problem using "$data/processed/reminimization algoritmo.dta", replace load_random_seeds mat rows = rowsof(S) local nseeds = el(rows,1,1) forval s = 1(1)`nseeds'{ local seed = el("S",`s',1) set seed `seed' gen double u=runiform() sort var u * does the deed * local problem=r(r) post `memhold' (`seed') (`problem') } local mod = mod(`s',50) //sets up dots if `mod'!=0 di in gr _c "." if `mod'==0{ di in gr _c ". --`num'" _newline local num = `num'+50 } } postclose `memhold' //Final seed selection use "$data/processed/reminimization.dta", clear sort problem seed local seed = seed[1] global SEED= `seed' di in gr "$SEED" sum if seed==$SEED //Assignment with best seed use `assignment', clear set seed $SEED gen double u=runiform() sort var u * does the deed * local problem=r(r)

I am trying to understand why, despite saving the seed that minimizes the number associated with "problem", the two values of `problem' are different (that which is recorded in "$data/processed/reminimization.dta" and that which is defined at the end of the code. That is, the minimum value of problem reported in "$data/processed/reminimization.dta" for some seed `seed' does not get recreated with the same seed once I am out of the "program". My initial thought was that things were getting mixed up with the sorting of the data, but I resolved this such that the data is always sorted the same way. Thoughts?
Tags: None
JoeSchmidt

Join Date: Jan 2015

Posts: 4
#2

11 Jun 2015, 13:36

Do you need the "sort var u" syntax?

Why not just

Code:

sort u

?
Comment
Jonathan Karver

Join Date: Nov 2014

Posts: 11
#3

11 Jun 2015, 13:41

Originally posted by JoeSchmidt View Post

Do you need the "sort var u" syntax?

Why not just

Code:

sort u

?

Sorry, I should have just put the variable name. I am sorting by school and this random variable u. I randomly assign observations to a finite number of groups within each school. Sorting by school first would not affect anything anyway, since the group selection is done for each school separately.
Comment
Jonathan Karver

Join Date: Nov 2014

Posts: 11
#4

11 Jun 2015, 14:05

So low and behold it took posting this hear to find my mistake. I needed to repeat a sort (not shown in the code above) because somehow defining the program "load_random_seeds" and running it knocked the sort order out of whack. I had the data sorted a specific way in `assignment', and somehow between opening it and defining the matrix of seeds, the order was affected. If someone knows why, I would love to know. Otherwise I will see if I can delete this thread since the issue has become irrelevant (sorry). 0/2 for me...
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30177
#5

11 Jun 2015, 14:14

It isn't clear from the code of program load_random_seeds why it disturbs the sort order of the data, but you can prevent that from happening by specifying the -sortpreserve- option in the -program define- statement. That will assure that on exit from the program, the sort order is restored to whatever it was before entry.

I don't think you can delete the thread, but even if you can, please don't. Others may learn something from your posts!
Comment
Jonathan Karver

Join Date: Nov 2014

Posts: 11
#6

12 Jun 2015, 07:39

Thanks, Clyde, and thank you for your suggestion on sortpreserve. I was also lost about how the sort order of the data would be disturbed from the load_random_seeds code.

And you are right, despite obviously overlooking a simple problem before posting, I realize this could be helpful to somebody.
Comment

nolwenn gontard

Join Date: Nov 2015
Posts: 1

25 Nov 2015, 02:24

Hi Jonathan, indeed some users can learn from this thread. I cannot apply the same solution as in your code (-sortpreserve- won't apply as I don't define any program), so I post my question here.

I'm having some troubles with re-randomization: I'm also unable to replicate the same random allocation of treatment I had the first time I ran my do-file.
I paste my code below (Stata version is Stata 13) - it is a pre randomization pairwise matching; random allocation of treatment is made within each variable pair; vill_id is the unique identifier of my 30 observations.
Re-randomization is made in order to avoid some close sites to have different treatment status.
The initial seed 153674 was selected beforehand.

Code:

* First randomization
use `data', clear
local s 153674
set seed `s'
g uni=runiform()
sort pair uni
g treat_1 = 0 if mod(_n,2)
replace treat_1 = 1 if treat_1==.
save `data', replace

*Using a reshaped dataset to test the condition over one line of observations
use `data', clear
cap drop uni
rename treat_1 treat
keep treat vill_id
gen var = 1
reshape wide treat, i(var) j(vill_id)

*Rerandomization
local s = 153674
set seed `s'
g uni=runiform()

*Looping over the same code:
*All conditions induce that while site 3 and site 4 do not have the same treatment status, and sites... and sites 15 and 16 do not have the same treatment status, re-randomize using another seed.
while (treat3==1 & treat4==0) | (treat3==0 & treat4==1) | (treat12==1 & treat13==1 & treat14==0) | (treat12==1 & treat13==0 & treat14==1) | (treat12==1 & treat13==0 & treat14==0) | (treat12==0 & treat13==1 & treat14==0) | (treat12==0 & treat13==0 & treat14==1) | (treat12==0 & treat13==1 & treat14==1) | (treat15==1 & treat16==0) | (treat15==0 & treat16==1) | (treat23==1 & treat26==0) | (treat23==0 & treat26==1) {
use `data', clear
local s = `s'+1
set seed `s'
drop uni
gen uni=runiform()
sort pair uni
drop treat_1
g treat_1 = 0 if mod(_n,2)
replace treat_1=1 if treat_1==.
save "bases_seed\trial`s'.dta", replace
rename treat_1 treat
keep treat vill_id
gen var = 1
reshape wide treat, i(var) j(vill_id)
}

reshape long treat, i(var) j(vill_id)
ren treat treat_1re
save "treat_rerandom.dta", replace
use `clear', clear
merge 1:1 vill_id using "treat_rerandom.dta"

Any thoughts?

Announcement

Repeated seed selection (problem recreating results with same seed)

Comment

Comment

Comment

Comment

Comment

Comment