Question: how to randomly choose one case from each of 72 groups?

Raymon Lucas

Join Date: Jul 2021

Posts: 25
#1

Question: how to randomly choose one case from each of 72 groups?

19 Jul 2024, 02:09

Hey colleagues,

The dataset consists of five key variables from var1 to var5. The five variables are either dichotomous (0/1) or trichotomous (0/1/2). Their combinations divide the observations into 2*2*2*3*3=72 groups. For instance, a group is made up of observations whose var1 is 1, var2 is 2, var3 is 0, var4 1 and var 5 0. My question is how we can randomly select exactly one observation from each group. Do-Loop seems like a must but I have no idea how to do it.

Variable list:
.id
.var1: a continuous variable that can be triply grouped: 0/1/2
.var2: an ordinal variable: 0/1/2
.var3: a dummy variable: 0/1
.var4: a dummy variable: 0/1
.var5: a dummy variable: 0/1

Looking forward to reply and many thanks!

Sincerely
Raymon Lucas
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35625
#2

19 Jul 2024, 03:05

Code:

egen group = group(var1 var2 var3 var4 var5), label * choose your own seed set seed 314159 gen double rnd = runiform() bysort group (rnd) : gen selected = _n == 1

Closer scrutiny of the code shows that the group variable isn't strictly needed, but it seems likely to be useful any way. No loops needed, except those Stata runs on your behalf,
Comment
George Ford

Join Date: Aug 2014

Posts: 3136
#3

19 Jul 2024, 09:08

if you want to do it repeatedly, then drop in the loop after set seed, and you'll either have to

Code:

capture drop rnd selected

or use tempvars.
Comment
Raymon Lucas

Join Date: Jul 2021

Posts: 25
#4

06 Aug 2024, 03:02

Hey Nick and George,

Many thanks for your help and sorry for late response! I have just come to the end of the summer vacation.

Your replies are truly helpful! The commands do work!

I have a follow-up question. My dataset consists of 17 groups according to a variable, say, voter’s age. More important, the size of the groups is considerably varying. In group 1, for instance, there are 1920 obs. Conversely, there are merely 32 in group 7.

I had thought of randomly selecting 1 obs. from each group. In this way, I will amass a representative sample. However, given the varying sizes of each group, should I draw different numbers of obs. from each group? If the answer is YES, how should I do it using Stata?

Thank you again!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35625
#5

06 Aug 2024, 03:18

I would turn your question around. You're talking about a stratified sample. It's best to read up on stratified sampling, work out what is best for your problem, and then pose a question about Stata code. Otherwise put, what you might do and how you would do cover such a wide range that a concise reply is hardly possible.
Comment

Announcement

Question: how to randomly choose one case from each of 72 groups?

Comment

Comment

Comment

Comment