Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random list from a sample

    Hi, I hope this post finds you well.

    I have been working on a sample with power calculations in a village with 1000 households, and I already have the required sample (600) for the village. The point is how to select those 600 houselds generating a random list with appropriate statistical procedures. These households are divided into 150 clusters and they have indentifiers (01001= cluster 1, dwelling 1).

    I have tried in Stata with an example of 9 cluster with a population of 170 and a required sample of 102 households (102/170=60%):

    sample 0.6, by (grp)

    tab grp

    obtaining something like the following:
    grp Freq. Percent Cum.
    1 12 11.88 11.88
    2 10 9.90 21.78
    3 13 12.87 34.65
    4 11 10.89 45.54
    5 15 14.85 60.40
    6 8 7.92 68.32
    7 8 7.92 76.24
    8 13 12.87 89.11
    9 11 10.89 100.00
    Total 101 100.00
    I got a random list, but I do not know if what I did is correct.

    Please let me know if the procedure is correct or if you have another idea for this case.

    Thanks so much!
    Last edited by Lewis Polo; 23 Aug 2017, 13:49.

  • #2
    Well, as you notice from your -tab- output you got 101 results, instead of 102. That's because when you specify the sampling fraction, instead of the sampling count, you may get a little random variation in the sample size itself. If you want exactly 102, change 0.6 to 102, and add the -count- option.

    Then there is the matter of -by(grp)-. You don't explain what this grp variable is, so it's hard to know what's going on here. What I can tell you is that with -by(grp)- specified, you are getting a sample consisting of (about) 60% of the observations from each block of observations defined by a value of grp. So if there are 120 observations with grp == 1 and 40 with grp == 2 in the original data, you will get (about) 72 observations with grp == 1 and 24 observations with grp == 2 in the sample. If you drop the -by(grp)- option, then the total of (about) 96 observations will be randomly drawn without regard to the values of grp. Which way is more appropriate to your goals cannot be discerned from what you have described.

    Comment


    • #3
      Mr Schechter, thanks for you reply.

      grp is the cluster.

      What I mean is that I have to apply a survey in a village, I already have the required sample that is 600 households and the village has like 3000 households in 150 clusters (it means, 20 households per cluster). For each cluster I have to gather 4 or 5 surveys at least, but I want to randomize those households (which have an identifier) with an algorithm and get a list.

      is it clear now?

      Thanks

      Comment


      • #4
        Yes. The only thing that's unclear is, if you have 20 households per cluster and you want to sample 4 or 5, why would you use a .6 sampling fraction? Your target fraction is .20 to .25, and even allowing for a 50% unavailability rate for households, a sampling fraction of 0.5 would be quite sufficient. But, the approach as a whole seems correct.

        Comment


        • #5
          Are there exactly 20 households per cluster? If not, what is the range of household counts?
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            Hi Steve Samuels, thanks for your reply.

            There are not exactly 20 households per cluster, is just an example. But there are 17 and 20 per cluster approx.

            Do you have any idea for an appropriate random selection?

            Comment


            • #7
              Your sample command would have sampled nothing and excludied everything, because fractions like 0.6 are not permitted. So read the help more carefully. By giving us numbers "like" the actual ones, you have only confused the problem. That is why FAQ 12 asks for exactly the commands you issued and the results they generated. Always put code and results between CODE delimiters as FAQ, described.

              Below I show how to randomly order households within a cluster. Approach households in the order shown until you have five cooperating ones.

              Code:
              /* Create two clusters one with 17 dwellings,  the other with 20 */
              clear
              set obs 37
              gen cluster = cond(_n<=17,1,2)
              bys cluster: gen dwelling = _n
              tempfile t1
              save `t1'
              
              /* Randomly order HH within clusters */
              set seed 64312
              tempvar u
              gen `u' = runiform()
              sort cluster `u'
              bys cluster: gen order = _n
              
              bys cluster (order): list cluster order dwelling, noobs sepby(cluster)
              
              /* First Cluster, for illustration */
              > cluster = 1
              
                +----------------------------+
                | cluster   order   dwelling |
                |----------------------------|
                |       1       1          6 |
                |       1       2         13 |
                |       1       3         10 |
                |       1       4          5 |
                |       1       5          9 |
                |       1       6         17 |
                |       1       7         16 |
                |       1       8          2 |
                |       1       9         15 |
                |       1      10          7 |
                |       1      11          3 |
                |       1      12         11 |
                |       1      13          8 |
                |       1      14         14 |
                |       1      15         12 |
                |       1      16          1 |
                |       1      17          4 |
                +----------------------------+
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment

              Working...
              X