Forgive me if this has been discussed elsewhere on Statalist, but I cannot find anything that specifically addresses the problem I am having. I have read through the manual on seed setting, and am still not convinced that my problem can be resolved with its tips (I might be wrong, of course).
Similar to re-randomization programs, I wrote code to run a series of commands to randomly assign a group number to every observation in my sample and I am trying to minimize the number of problems (total number of groups with an unbalanced number of males, for example). To do this, I define 10,000 unique seeds, and thus 10,000 unique variations of a random variable (u=runiform()), and save the "number of problems" in the dataset, along with the seed, for each draw. The point of doing this is to stay with the seed that minimizes the number of problems (optimize my randomization, in a way). There is nothing underlying how I define this problem that could cause an error (the code is to assign randomly to one of X groups within a school, and count the number of groups in the dataset that have a substantially different proportion of males than the school itself. A truly random assignment of groups within each school would mean that the proportion of males within each group would be no different than the proportion of males in the school)
The problem is, once I have defined this optimal seed as a macro, I am unable to replicate the results. That is, my `problem' is not the same after defining the optimal seed out of 10,000 draws even when I use the optimal seed. Here is the snippet of my code that is necessary to understand my problem (all code is run in version 13.1). I have been running with 100 and 1,000 unique seeds since 10,000 takes forever, but the result is obviously the same:
I am trying to understand why, despite saving the seed that minimizes the number associated with "problem", the two values of `problem' are different (that which is recorded in "$data/processed/reminimization.dta" and that which is defined at the end of the code. That is, the minimum value of problem reported in "$data/processed/reminimization.dta" for some seed `seed' does not get recreated with the same seed once I am out of the "program". My initial thought was that things were getting mixed up with the sorting of the data, but I resolved this such that the data is always sorted the same way. Thoughts?
Similar to re-randomization programs, I wrote code to run a series of commands to randomly assign a group number to every observation in my sample and I am trying to minimize the number of problems (total number of groups with an unbalanced number of males, for example). To do this, I define 10,000 unique seeds, and thus 10,000 unique variations of a random variable (u=runiform()), and save the "number of problems" in the dataset, along with the seed, for each draw. The point of doing this is to stay with the seed that minimizes the number of problems (optimize my randomization, in a way). There is nothing underlying how I define this problem that could cause an error (the code is to assign randomly to one of X groups within a school, and count the number of groups in the dataset that have a substantially different proportion of males than the school itself. A truly random assignment of groups within each school would mean that the proportion of males within each group would be no different than the proportion of males in the school)
The problem is, once I have defined this optimal seed as a macro, I am unable to replicate the results. That is, my `problem' is not the same after defining the optimal seed out of 10,000 draws even when I use the optimal seed. Here is the snippet of my code that is necessary to understand my problem (all code is run in version 13.1). I have been running with 100 and 1,000 unique seeds since 10,000 takes forever, but the result is obviously the same:
Code:
use `assignment', clear ** Select seed that minimizes problems //script 1: loads random seeds to memory capture program drop load_random_seeds program define load_random_seeds { preserve drop _all set seed 15485863 set obs 10000 capture drop u gen double u = runiform() gen randomseeds = round(u*10000000) //creates 1,000 (10,000) unique seeds based on u mkmat randomseeds, mat(S) //matrix S now has all seed values restore } end //Selection of seed that minimizes problems in control schools tempname memhold postfile `memhold' seed problem using "$data/processed/reminimization algoritmo.dta", replace load_random_seeds mat rows = rowsof(S) local nseeds = el(rows,1,1) forval s = 1(1)`nseeds'{ local seed = el("S",`s',1) set seed `seed' gen double u=runiform() sort var u * does the deed * local problem=r(r) post `memhold' (`seed') (`problem') } local mod = mod(`s',50) //sets up dots if `mod'!=0 di in gr _c "." if `mod'==0{ di in gr _c ". --`num'" _newline local num = `num'+50 } } postclose `memhold' //Final seed selection use "$data/processed/reminimization.dta", clear sort problem seed local seed = seed[1] global SEED= `seed' di in gr "$SEED" sum if seed==$SEED //Assignment with best seed use `assignment', clear set seed $SEED gen double u=runiform() sort var u * does the deed * local problem=r(r)
Comment