How to use set seed option to draw different samples?

Melody Brown

Join Date: May 2022

Posts: 81
#1

How to use set seed option to draw different samples?

23 Jan 2023, 13:52

Hi everyone,

I have a question about the set seed command in Stata.

I have drawn 1000 samples from a dataset using the command

set seed 1234
sample 1000,count

Now I want to get a completely different sample dataset (no overlap with the previous one). Would it work if I set a different seed number such as 5678?

Thanks and I appreciate your reply.
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#2

23 Jan 2023, 14:08

Setting seed cannot guarantee that the chosen samples will not overlap. If you want to enforce this requirement, you need to exclude the initial sample when selecting the second sample.

Code:

gen long obsno=_n preserve *SAMPLE 1 set seed 1234 sample 1000,count tempfile sample1 save `sample1' restore merge 1:1 * using `sample1', keep(master) nogen *SAMPLE 2 set seed 5678 sample 1000,count

Sample 1 is available from

Code:

use `sample1', clear

Last edited by Andrew Musau; 23 Jan 2023, 14:11.
1 like
Comment
Melody Brown

Join Date: May 2022

Posts: 81
#3

23 Jan 2023, 15:07

Originally posted by Andrew Musau View Post

Setting seed cannot guarantee that the chosen samples will not overlap. If you want to enforce this requirement, you need to exclude the initial sample when selecting the second sample.

Code:

gen long obsno=_n preserve *SAMPLE 1 set seed 1234 sample 1000,count tempfile sample1 save `sample1' restore merge 1:1 * using `sample1', keep(master) nogen *SAMPLE 2 set seed 5678 sample 1000,count

Sample 1 is available from

Code:

use `sample1', clear

Thanks a lot, Andrew. I really appreciate your help.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

23 Jan 2023, 15:08

Alternatively, perhaps

Code:

set seed 1234 sample 2000, count generate shuffle = runiform() sort shuffle drop shuffle generate sampnum = cond(_n<=1000,1,2)

would start you in a useful direction.
Comment
Melody Brown

Join Date: May 2022

Posts: 81
#5

23 Jan 2023, 15:19

Originally posted by Andrew Musau View Post

Setting seed cannot guarantee that the chosen samples will not overlap. If you want to enforce this requirement, you need to exclude the initial sample when selecting the second sample.

Code:

gen long obsno=_n preserve *SAMPLE 1 set seed 1234 sample 1000,count tempfile sample1 save `sample1' restore merge 1:1 * using `sample1', keep(master) nogen *SAMPLE 2 set seed 5678 sample 1000,count

Sample 1 is available from

Code:

use `sample1', clear

Hi Andrew,

I just realized that my case is a little bit different in that in the original dataset, one id is associated with multiple observations. What I want to do is randomly choose 1000 ids and get all the observations with the ids. The code that I use for this is as below. I wonder if you know how to revise that so that I can choose 1000 completely different consumers and all the associated observations.

Thanks a lot!

Code:

tempfile holding save `holding' keep id duplicates drop set seed 1234 sample 1000, count merge 1:m id using `holding', assert(match using) keep(match) nogenerate save "sample1.dta"

[/QUOTE]
1 like
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

23 Jan 2023, 23:26

You want to use the -tag()- function of egen to tag a single observation from a group and then sample from the tagged observations. Here is an example:

Code:

tempfile sample1 sample2
webuse nlswork, clear
egen tag= tag(idcode)
gen long obsno=_n
preserve
keep if tag
*SAMPLE 1
set seed 1234
sample 1000, count
save `sample1'
restore
merge 1:1 * using `sample1', keep(master match)
bys idcode: egen sample1= max(_merge==3)
drop _merge
preserve
keep if sample1
save `sample1', replace
restore
drop if sample1
*SAMPLE 2
preserve
keep if tag
set seed 5678
sample 1000,count
save `sample2'
restore
merge 1:1 * using `sample2', keep(master match)
bys idcode: egen sample2= max(_merge==3)
drop _merge
preserve
keep if sample2
save `sample2', replace

Samples available from:

Code:

*Sample 1
use sample1, clear
*Sample 2
use sample2, clear

Comment

Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

23 Jan 2023, 23:41

The general recommendation is that one sets the seed once in the .do file, and does not fiddle with the seed anymore. Everything will be taken care automatically.
Comment

Announcement

How to use set seed option to draw different samples?

Comment

Comment

Comment

Comment

Comment

Comment