Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random sampling and saving the output and seed

    I need to select a random sample from my dataset and estimate my model and then save the output. However, for the purposes of reproducibility I also need the seed for each iteration. While this may sound like something that can be done with bootstrap, the actual problem is different than so bootstrap will not help. So, here is an example of what I want:

    Code:
    //Generate data
    local iter=10
    set obs 100
    gen x=rnormal(0,1)
    gen mu=1+(2*x)
    gen y=rnormal(mu,1)
    
    //Generate random seeds
    gen double u = (2147483647-1)*runiform() + 1
    gen rndseed = round(u)
    gen slopes=.
     
    forvalues i=1/`iter' {
     qui replace myseed="`rndseed'" in `i' //Use the seed to sample
     preserve
     set seed myseed
     bsample 50
     qui regress y x //Carry out regression on the subsample
     matrix A=e(b)
     local beta=A[1,1]  //Save estimates
     qui replace slopes=`beta' in `i'
     restore
    }
    The first block generates the seeds that I was hoping to use in the bsample. In the loop, I planned to randomly sample using the corresponding seed. However, this code does not run. I get an error "invalid syntax". I am at a loss and would appreciate any guidance.

  • #2
    Well, you hit a syntax error at the -qui replace myseed=- command at the top of your loop, because the variable myseed does not exist at that point. You can resolve that by putting -gen myseed = .- just before you start your loop. But then you are trying to turn myseed into a text variable, which is a type mismatch and, in any case makes no sense.

    I could go on with how to fix this, but there are other larger problems. You are calculating a regression slope each time through your loop, but at the end of your loop it will all be lost because you immediately follow that with -restore-, which brings back the data you started with.

    And I don't understand the logic of setting the random number seed over again each time through the loop. Just set the seed once at the beginning of the program, and then let it run through the loop. You will get perfectly good new pseudo-random numbers at each iteration of the loop. So I think all you need is:

    Code:
    clear*
    local iter=10
    set seed 1234 // OR WHATEVER SEED YOU LIKE
    // AND IF YOU WANT TO SAVE THE INITIAL SEED IN THE DATA SET ITSELF, NOT JUST
    // HAVING IT IN THE DO-FILE:
    char _dta[initial_seed] 1234
    
    //Generate data
    set obs 100
    gen x=rnormal(0,1)
    gen mu=1+(2*x)
    gen y=rnormal(mu,1)
    gen slopes = .
    
     forvalues i=1/`iter' {
         preserve
         bsample 50
         qui regress y x //Carry out regression on the subsample
         local beta=_b[x]  //Save estimates (NO NEED FOR A MATRIX TO GET THIS)
         restore
         qui replace slopes=`beta' in `i'
    }

    Comment


    • #3
      Thanks Clyde....the code works. However, the problem is how do I figure out what was the seed used for the particular slope (for further analysis). Is there a way to save or recover the sample that was used for the particular bsample? Let me know if this is not clear...thanks!

      Comment


      • #4
        I don't think you can recover the seed. But you can retrieve the random number generator state--which is a 5,000 character string. See -help seed- for details. And if you really want to save that in the data set, you can do so provide your Stata is recent enough version to have strLs. So here's what I would do:

        Code:
        clear*
        local iter=10
        set seed 1234 // OR WHATEVER SEED YOU LIKE
        
        //Generate data
        set obs 100
        gen x=rnormal(0,1)
        gen mu=1+(2*x)
        gen y=rnormal(mu,1)
        gen slopes = .
        gen strL rngstate = c(rngstate) in 1
        
         forvalues i=1/`iter' {
             preserve
             bsample 50
             qui regress y x //Carry out regression on the subsample
             local beta=_b[x]  //Save estimates (NO NEED FOR A MATRIX TO GET THIS)
             restore
             qui replace slopes=`beta' in `i'
             qui replace rngstate = c(rngstate) in `i'
        }
        save my_results, replace
        If at some point you needed to replicate, say the 7th iteration's regression, you could then do this:

        Code:
        use my_results, clear
        set rngstate `=rngstate[7]'
        bsample 50
        regress y x

        Comment


        • #5
          I have Stata 13 and it seems it supports srtLs, but I am getting an error:

          . //Generate data
          . set obs 100
          obs was 0, now 100

          . gen x=rnormal(0,1)

          . gen mu=1+(2*x)

          . gen y=rnormal(mu,1)

          . gen slopes = .
          (100 missing values generated)

          . gen strL rngstate = c(rngstate) in 1
          c(rngstate) undefined

          May be we need something before the last statement.

          Comment


          • #6
            Hmmm. I see from the -help whatsnew13to14- that c(rngstate) is new as of version 14. I no longer have version 13 installed on this machine, and I don't have access to my other computer today. So I can't test this, but I think if you replace c(rngstate) by c(seed) everywhere in the first block of code, and change the second line of the lower block of code to -set seed `=rngstate[7]', I think it will work. Oh, and I think that c(seed) will be numeric, not a strL, so change that strL to long.

            By the way, we do ask posters to state in their posts what version of Stata they are using if it is not the current one. This is one case where it really mattered.

            Comment


            • #7
              Just to complete the loop, here is the code that worked in Stata 13 (for anyone else looking for the solution):
              Code:
              clear*
              local iter=10
              set seed 1234 // OR WHATEVER SEED YOU LIKE
              
              //Generate data
              set obs 100
              gen x=rnormal(0,1)
              gen mu=1+(2*x)
              gen y=rnormal(mu,1)
              gen slopes = .
              gen strL rngstate = c(seed) in 1
              
               forvalues i=1/`iter' {
                   preserve
                   bsample 50
                   qui regress y x //Carry out regression on the subsample
                   local beta=_b[x]  //Save estimates (NO NEED FOR A MATRIX TO GET THIS)
                   restore
                   qui replace slopes=`beta' in `i'
                   qui replace rngstate = c(seed) in `i'
              }
              save my_results, replace
              
              use my_results, clear
              set seed `=rngstate[7]'
              bsample 50
              regress y x
              Thanks a lot!!

              Comment

              Working...
              X