Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Repeated seed selection (problem recreating results with same seed)

    Forgive me if this has been discussed elsewhere on Statalist, but I cannot find anything that specifically addresses the problem I am having. I have read through the manual on seed setting, and am still not convinced that my problem can be resolved with its tips (I might be wrong, of course).

    Similar to re-randomization programs, I wrote code to run a series of commands to randomly assign a group number to every observation in my sample and I am trying to minimize the number of problems (total number of groups with an unbalanced number of males, for example). To do this, I define 10,000 unique seeds, and thus 10,000 unique variations of a random variable (u=runiform()), and save the "number of problems" in the dataset, along with the seed, for each draw. The point of doing this is to stay with the seed that minimizes the number of problems (optimize my randomization, in a way). There is nothing underlying how I define this problem that could cause an error (the code is to assign randomly to one of X groups within a school, and count the number of groups in the dataset that have a substantially different proportion of males than the school itself. A truly random assignment of groups within each school would mean that the proportion of males within each group would be no different than the proportion of males in the school)

    The problem is, once I have defined this optimal seed as a macro, I am unable to replicate the results. That is, my `problem' is not the same after defining the optimal seed out of 10,000 draws even when I use the optimal seed. Here is the snippet of my code that is necessary to understand my problem (all code is run in version 13.1). I have been running with 100 and 1,000 unique seeds since 10,000 takes forever, but the result is obviously the same:

    Code:
        use `assignment', clear
            
    ** Select seed that minimizes problems   
        //script 1: loads random seeds to memory
        capture program drop load_random_seeds
        program define load_random_seeds
      {
    preserve
        drop _all
        set seed 15485863
        set obs 10000
        capture drop u
        gen double u = runiform()
        gen randomseeds   = round(u*10000000) //creates 1,000 (10,000) unique seeds based on u
        mkmat randomseeds, mat(S) //matrix S now has all seed values
    restore
      }
      end
      
        //Selection of seed that minimizes problems in control schools
        tempname memhold
        postfile `memhold' seed problem using "$data/processed/reminimization algoritmo.dta", replace
        load_random_seeds
        mat rows = rowsof(S)
        local nseeds = el(rows,1,1)
        forval s = 1(1)`nseeds'{
            local seed = el("S",`s',1)    
            set seed `seed'
            gen double u=runiform()
            sort var u
            * does the deed *
            local problem=r(r)
            post `memhold' (`seed') (`problem')
            }
            local mod = mod(`s',50)  //sets up dots
            if `mod'!=0 di in gr _c "."
            if `mod'==0{
                di in gr _c ". --`num'" _newline
                local num = `num'+50
            }  
        }
        postclose `memhold'
        
      //Final seed selection
        use "$data/processed/reminimization.dta", clear
        sort problem seed
        local seed = seed[1]
        global SEED= `seed' 
        di in gr "$SEED"
        sum if seed==$SEED
    
      //Assignment with best seed
        use `assignment', clear
        set seed $SEED
        gen double u=runiform()
        sort var u
        * does the deed *
        local problem=r(r)
    I am trying to understand why, despite saving the seed that minimizes the number associated with "problem", the two values of `problem' are different (that which is recorded in "$data/processed/reminimization.dta" and that which is defined at the end of the code. That is, the minimum value of problem reported in "$data/processed/reminimization.dta" for some seed `seed' does not get recreated with the same seed once I am out of the "program". My initial thought was that things were getting mixed up with the sorting of the data, but I resolved this such that the data is always sorted the same way. Thoughts?

  • #2
    Do you need the "sort var u" syntax?

    Why not just

    Code:
    sort u
    ?

    Comment


    • #3
      Originally posted by JoeSchmidt View Post
      Do you need the "sort var u" syntax?

      Why not just

      Code:
      sort u
      ?
      Sorry, I should have just put the variable name. I am sorting by school and this random variable u. I randomly assign observations to a finite number of groups within each school. Sorting by school first would not affect anything anyway, since the group selection is done for each school separately.

      Comment


      • #4
        So low and behold it took posting this hear to find my mistake. I needed to repeat a sort (not shown in the code above) because somehow defining the program "load_random_seeds" and running it knocked the sort order out of whack. I had the data sorted a specific way in `assignment', and somehow between opening it and defining the matrix of seeds, the order was affected. If someone knows why, I would love to know. Otherwise I will see if I can delete this thread since the issue has become irrelevant (sorry). 0/2 for me...

        Comment


        • #5
          It isn't clear from the code of program load_random_seeds why it disturbs the sort order of the data, but you can prevent that from happening by specifying the -sortpreserve- option in the -program define- statement. That will assure that on exit from the program, the sort order is restored to whatever it was before entry.

          I don't think you can delete the thread, but even if you can, please don't. Others may learn something from your posts!

          Comment


          • #6
            Thanks, Clyde, and thank you for your suggestion on sortpreserve. I was also lost about how the sort order of the data would be disturbed from the load_random_seeds code.

            And you are right, despite obviously overlooking a simple problem before posting, I realize this could be helpful to somebody.

            Comment


            • #7
              Hi Jonathan, indeed some users can learn from this thread. I cannot apply the same solution as in your code (-sortpreserve- won't apply as I don't define any program), so I post my question here.

              I'm having some troubles with re-randomization: I'm also unable to replicate the same random allocation of treatment I had the first time I ran my do-file.
              I paste my code below (Stata version is Stata 13) - it is a pre randomization pairwise matching; random allocation of treatment is made within each variable pair; vill_id is the unique identifier of my 30 observations.
              Re-randomization is made in order to avoid some close sites to have different treatment status.
              The initial seed 153674 was selected beforehand.


              Code:
              * First randomization
              use `data', clear
              local s 153674
              set seed `s'
              g uni=runiform()
              sort pair uni
              g treat_1 = 0 if mod(_n,2)
              replace treat_1 = 1 if treat_1==.
              save `data', replace
              
              *Using a reshaped dataset to test the condition over one line of observations
              use `data', clear
              cap drop uni
              rename treat_1 treat
              keep treat vill_id
              gen var = 1
              reshape wide treat, i(var) j(vill_id)
              
              *Rerandomization
              local s = 153674
              set seed `s'
              g uni=runiform()
              
              *Looping over the same code:
              *All conditions induce that while site 3 and site 4 do not have the same treatment status, and sites... and sites 15 and 16 do not have the same treatment status, re-randomize using another seed.
              while (treat3==1 & treat4==0) | (treat3==0 & treat4==1) | (treat12==1 & treat13==1 & treat14==0) | (treat12==1 & treat13==0 & treat14==1) | (treat12==1 & treat13==0 & treat14==0) | (treat12==0 & treat13==1 & treat14==0) | (treat12==0 & treat13==0 & treat14==1) | (treat12==0 & treat13==1 & treat14==1) | (treat15==1 & treat16==0) | (treat15==0 & treat16==1) | (treat23==1 & treat26==0) | (treat23==0 & treat26==1) {
              use `data', clear
              local s = `s'+1
              set seed `s'
              drop uni
              gen uni=runiform()
              sort pair uni
              drop treat_1
              g treat_1 = 0 if mod(_n,2)
              replace treat_1=1 if treat_1==.
              save "bases_seed\trial`s'.dta", replace
              rename treat_1 treat
              keep treat vill_id
              gen var = 1
              reshape wide treat, i(var) j(vill_id)
              }
              
              reshape long treat, i(var) j(vill_id)
              ren treat treat_1re
              save "treat_rerandom.dta", replace
              use `clear', clear
              merge 1:1 vill_id using "treat_rerandom.dta"
              Any thoughts?

              Comment

              Working...
              X