Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to use set seed option to draw different samples?

    Hi everyone,

    I have a question about the set seed command in Stata.

    I have drawn 1000 samples from a dataset using the command
    set seed 1234
    sample 1000,count
    Now I want to get a completely different sample dataset (no overlap with the previous one). Would it work if I set a different seed number such as 5678?

    Thanks and I appreciate your reply.

  • #2
    Setting seed cannot guarantee that the chosen samples will not overlap. If you want to enforce this requirement, you need to exclude the initial sample when selecting the second sample.

    Code:
    gen long obsno=_n
    preserve
    *SAMPLE 1
    set seed 1234
    sample 1000,count
    tempfile sample1
    save `sample1'
    restore
    merge 1:1 * using `sample1', keep(master) nogen
    *SAMPLE 2
    set seed 5678
    sample 1000,count
    Sample 1 is available from

    Code:
    use `sample1', clear
    Last edited by Andrew Musau; 23 Jan 2023, 14:11.

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      Setting seed cannot guarantee that the chosen samples will not overlap. If you want to enforce this requirement, you need to exclude the initial sample when selecting the second sample.

      Code:
      gen long obsno=_n
      preserve
      *SAMPLE 1
      set seed 1234
      sample 1000,count
      tempfile sample1
      save `sample1'
      restore
      merge 1:1 * using `sample1', keep(master) nogen
      *SAMPLE 2
      set seed 5678
      sample 1000,count
      Sample 1 is available from

      Code:
      use `sample1', clear
      Thanks a lot, Andrew. I really appreciate your help.

      Comment


      • #4
        Alternatively, perhaps
        Code:
        set seed 1234
        sample 2000, count
        generate shuffle = runiform()
        sort shuffle
        drop shuffle
        generate sampnum = cond(_n<=1000,1,2)
        would start you in a useful direction.

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          Setting seed cannot guarantee that the chosen samples will not overlap. If you want to enforce this requirement, you need to exclude the initial sample when selecting the second sample.

          Code:
          gen long obsno=_n
          preserve
          *SAMPLE 1
          set seed 1234
          sample 1000,count
          tempfile sample1
          save `sample1'
          restore
          merge 1:1 * using `sample1', keep(master) nogen
          *SAMPLE 2
          set seed 5678
          sample 1000,count
          Sample 1 is available from

          Code:
          use `sample1', clear
          Hi Andrew,

          I just realized that my case is a little bit different in that in the original dataset, one id is associated with multiple observations. What I want to do is randomly choose 1000 ids and get all the observations with the ids. The code that I use for this is as below. I wonder if you know how to revise that so that I can choose 1000 completely different consumers and all the associated observations.

          Thanks a lot!

          Code:
          tempfile holding
          save `holding'
          
          keep id
          duplicates drop
          
          set seed 1234
          sample 1000, count
          
          merge 1:m id using `holding', assert(match using) keep(match) nogenerate
          
          save "sample1.dta"
          [/QUOTE]

          Comment


          • #6
            You want to use the -tag()- function of egen to tag a single observation from a group and then sample from the tagged observations. Here is an example:

            Code:
            tempfile sample1 sample2
            webuse nlswork, clear
            egen tag= tag(idcode)
            gen long obsno=_n
            preserve
            keep if tag
            *SAMPLE 1
            set seed 1234
            sample 1000, count
            save `sample1'
            restore
            merge 1:1 * using `sample1', keep(master match)
            bys idcode: egen sample1= max(_merge==3)
            drop _merge
            preserve
            keep if sample1
            save `sample1', replace
            restore
            drop if sample1
            *SAMPLE 2
            preserve
            keep if tag
            set seed 5678
            sample 1000,count
            save `sample2'
            restore
            merge 1:1 * using `sample2', keep(master match)
            bys idcode: egen sample2= max(_merge==3)
            drop _merge
            preserve
            keep if sample2
            save `sample2', replace
            Samples available from:

            Code:
            *Sample 1
            use sample1, clear
            *Sample 2
            use sample2, clear

            Comment


            • #7
              The general recommendation is that one sets the seed once in the .do file, and does not fiddle with the seed anymore. Everything will be taken care automatically.

              Comment

              Working...
              X