Hello, I want to sample 20 schools out of a total of 62 schools. I have placed these schools in 6 strata of different sizes. I want to figure out the # of schools to take from each strata given the size of each stratum. When I run this code, I keep getting 17 schools. Do you have suggestions on how to get to 20 schools? Thank you in advance!
set seed 25021986
* divide schools into strata based on subregion, size, score
egen strata = group(subregion size_cat score_cat)
* generate a random number from a uniform distribution between 0 and 1
gen double rand = runiform()
* sort the strata by the random #
sort strata rand, stable
* generate a rank within the strata
by strata: gen rank_withinstrata = _n
* Sample proportionally according to the size of each stratum
* total number of schools in each stratum
bysort strata: egen stratum_size = count(school_id)
* total schools in dataset
egen total_schools = count(school_id)
* calculate the proportion of schools in each stratum
gen proportion = stratum_size / total_schools
* determines # schools to sample from each stratum so proportional to the size of the stratum
gen schools_to_sample = floor(20 * proportion)
* this makes sure at least one school sampled from each stratum
replace schools_to_sample = ceil(20 * proportion) if schools_to_sample == 0
** check to make sure we get 20 schools
* Tag the first occurrence of each stratum to consider for summing
bysort strata: gen first_in_strata = _n == 1
* Calculate the total only using the first occurrence of each stratum
egen total_assigned = total(schools_to_sample * first_in_strata)
* Display the total number of schools assigned to be sampled
display "Total schools assigned to be sampled: " total_assigned
* Calculate the shortfall
local shortfall = 20 - total_assigned
display "Shortfall: " `shortfall'
* gives us 17 schools
set seed 25021986
* divide schools into strata based on subregion, size, score
egen strata = group(subregion size_cat score_cat)
* generate a random number from a uniform distribution between 0 and 1
gen double rand = runiform()
* sort the strata by the random #
sort strata rand, stable
* generate a rank within the strata
by strata: gen rank_withinstrata = _n
* Sample proportionally according to the size of each stratum
* total number of schools in each stratum
bysort strata: egen stratum_size = count(school_id)
* total schools in dataset
egen total_schools = count(school_id)
* calculate the proportion of schools in each stratum
gen proportion = stratum_size / total_schools
* determines # schools to sample from each stratum so proportional to the size of the stratum
gen schools_to_sample = floor(20 * proportion)
* this makes sure at least one school sampled from each stratum
replace schools_to_sample = ceil(20 * proportion) if schools_to_sample == 0
** check to make sure we get 20 schools
* Tag the first occurrence of each stratum to consider for summing
bysort strata: gen first_in_strata = _n == 1
* Calculate the total only using the first occurrence of each stratum
egen total_assigned = total(schools_to_sample * first_in_strata)
* Display the total number of schools assigned to be sampled
display "Total schools assigned to be sampled: " total_assigned
* Calculate the shortfall
local shortfall = 20 - total_assigned
display "Shortfall: " `shortfall'
* gives us 17 schools
Comment