Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sampling proportionally according to the size of each stratum

    Hello, I want to sample 20 schools out of a total of 62 schools. I have placed these schools in 6 strata of different sizes. I want to figure out the # of schools to take from each strata given the size of each stratum. When I run this code, I keep getting 17 schools. Do you have suggestions on how to get to 20 schools? Thank you in advance!

    set seed 25021986

    * divide schools into strata based on subregion, size, score
    egen strata = group(subregion size_cat score_cat)

    * generate a random number from a uniform distribution between 0 and 1
    gen double rand = runiform()

    * sort the strata by the random #
    sort strata rand, stable

    * generate a rank within the strata
    by strata: gen rank_withinstrata = _n


    * Sample proportionally according to the size of each stratum

    * total number of schools in each stratum
    bysort strata: egen stratum_size = count(school_id)
    * total schools in dataset
    egen total_schools = count(school_id)
    * calculate the proportion of schools in each stratum
    gen proportion = stratum_size / total_schools

    * determines # schools to sample from each stratum so proportional to the size of the stratum
    gen schools_to_sample = floor(20 * proportion)

    * this makes sure at least one school sampled from each stratum
    replace schools_to_sample = ceil(20 * proportion) if schools_to_sample == 0

    ** check to make sure we get 20 schools
    * Tag the first occurrence of each stratum to consider for summing
    bysort strata: gen first_in_strata = _n == 1
    * Calculate the total only using the first occurrence of each stratum
    egen total_assigned = total(schools_to_sample * first_in_strata)
    * Display the total number of schools assigned to be sampled
    display "Total schools assigned to be sampled: " total_assigned
    * Calculate the shortfall
    local shortfall = 20 - total_assigned
    display "Shortfall: " `shortfall'

    * gives us 17 schools

  • #2
    Some combinations of the sizes of each stratum and the designated total of 20 are mathematically impossible to achieve. Let's take a simpler case. Suppose there were exactly two strata, each containing 10 schools. Since the two strata have the same number of schools, you would want to sample them in equal numbers to have proportional sampling. But this means that the total sample size will be an even number. No matter how you play with it, you could not possibly get a total sample size of 11 out of this sample. I suspect this is the kind of situation you are facing.

    There is also the possibility that due to the truncation of number of schools to sample to an integer, it may be that the total cannot add up to your target. For another example, if there are two strata, one of which has 14 schools and the other has 7, then the proportions of the 21 total schools are 0.67 and 0.33, in a 2:1 ratio. Suppose your target sample size is 13 schools. Your targets for these two strata would then be 8 (floor(13*2/3)) and 4 (floor(13*1/3)), which adds up to 12--you cannot match 13.

    Comment

    Working...
    X