Random list from a sample

Lewis Polo

Join Date: Aug 2017
Posts: 8

Random list from a sample

23 Aug 2017, 13:46

Hi, I hope this post finds you well.

I have been working on a sample with power calculations in a village with 1000 households, and I already have the required sample (600) for the village. The point is how to select those 600 houselds generating a random list with appropriate statistical procedures. These households are divided into 150 clusters and they have indentifiers (01001= cluster 1, dwelling 1).

I have tried in Stata with an example of 9 cluster with a population of 170 and a required sample of 102 households (102/170=60%):

sample 0.6, by (grp)

tab grp

obtaining something like the following:

grp	Freq.	Percent	Cum.

1	12	11.88	11.88
2	10	9.90	21.78
3	13	12.87	34.65
4	11	10.89	45.54
5	15	14.85	60.40
6	8	7.92	68.32
7	8	7.92	76.24
8	13	12.87	89.11
9	11	10.89	100.00

Total	101	100.00

I got a random list, but I do not know if what I did is correct.

Please let me know if the procedure is correct or if you have another idea for this case.

Thanks so much!

Last edited by Lewis Polo; 23 Aug 2017, 13:49.

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30018
#2

23 Aug 2017, 14:00

Well, as you notice from your -tab- output you got 101 results, instead of 102. That's because when you specify the sampling fraction, instead of the sampling count, you may get a little random variation in the sample size itself. If you want exactly 102, change 0.6 to 102, and add the -count- option.

Then there is the matter of -by(grp)-. You don't explain what this grp variable is, so it's hard to know what's going on here. What I can tell you is that with -by(grp)- specified, you are getting a sample consisting of (about) 60% of the observations from each block of observations defined by a value of grp. So if there are 120 observations with grp == 1 and 40 with grp == 2 in the original data, you will get (about) 72 observations with grp == 1 and 24 observations with grp == 2 in the sample. If you drop the -by(grp)- option, then the total of (about) 96 observations will be randomly drawn without regard to the values of grp. Which way is more appropriate to your goals cannot be discerned from what you have described.
1 like
Comment
Lewis Polo

Join Date: Aug 2017

Posts: 8
#3

23 Aug 2017, 14:27

Mr Schechter, thanks for you reply.

grp is the cluster.

What I mean is that I have to apply a survey in a village, I already have the required sample that is 600 households and the village has like 3000 households in 150 clusters (it means, 20 households per cluster). For each cluster I have to gather 4 or 5 surveys at least, but I want to randomize those households (which have an identifier) with an algorithm and get a list.

is it clear now?

Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30018
#4

23 Aug 2017, 16:19

Yes. The only thing that's unclear is, if you have 20 households per cluster and you want to sample 4 or 5, why would you use a .6 sampling fraction? Your target fraction is .20 to .25, and even allowing for a 50% unavailability rate for households, a sampling fraction of 0.5 would be quite sufficient. But, the approach as a whole seems correct.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

24 Aug 2017, 19:01

Are there exactly 20 households per cluster? If not, what is the range of household counts?

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Lewis Polo

Join Date: Aug 2017

Posts: 8
#6

11 Sep 2017, 10:48

Hi Steve Samuels, thanks for your reply.

There are not exactly 20 households per cluster, is just an example. But there are 17 and 20 per cluster approx.

Do you have any idea for an appropriate random selection?
Comment

Steve Samuels

Join Date: Mar 2014
Posts: 1786

12 Sep 2017, 20:32

Your sample command would have sampled nothing and excludied everything, because fractions like 0.6 are not permitted. So read the help more carefully. By giving us numbers "like" the actual ones, you have only confused the problem. That is why FAQ 12 asks for exactly the commands you issued and the results they generated. Always put code and results between CODE delimiters as FAQ, described.

Below I show how to randomly order households within a cluster. Approach households in the order shown until you have five cooperating ones.

Code:

/* Create two clusters one with 17 dwellings,  the other with 20 */
clear
set obs 37
gen cluster = cond(_n<=17,1,2)
bys cluster: gen dwelling = _n
tempfile t1
save `t1'

/* Randomly order HH within clusters */
set seed 64312
tempvar u
gen `u' = runiform()
sort cluster `u'
bys cluster: gen order = _n

bys cluster (order): list cluster order dwelling, noobs sepby(cluster)

/* First Cluster, for illustration */
> cluster = 1

  +----------------------------+
  | cluster   order   dwelling |
  |----------------------------|
  |       1       1          6 |
  |       1       2         13 |
  |       1       3         10 |
  |       1       4          5 |
  |       1       5          9 |
  |       1       6         17 |
  |       1       7         16 |
  |       1       8          2 |
  |       1       9         15 |
  |       1      10          7 |
  |       1      11          3 |
  |       1      12         11 |
  |       1      13          8 |
  |       1      14         14 |
  |       1      15         12 |
  |       1      16          1 |
  |       1      17          4 |
  +----------------------------+

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2

Announcement

Random list from a sample

Comment

Comment

Comment

Comment

Comment

Comment