Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stratified sample using one stratum.

    Hi! I am doing a study about schools in different municipalities and are trying to do a stratified sample using proportional allocation without replacement. I am stratifying with respect to municipalities and then will draw a number of schools from each stratum (at least one school from each municipality). The proportional allocation will be based on the number of schools in each municipality relative to the total number of schools, which means that municipalities with a larger number of schools also have more schools sampled. The total number of schools in my population is 295 and I want to sample 144 of them.

    I am trying to use the command gsample but it doesn't work, I don't know if there is a more suitable command....
    If someone knows anything about a stratified sample I would be very grateful for there help.

    My code: gsample [144|school] [,strata(municipality)]
    The output from Stata: weights not allowed

  • #2
    -gsample- is a user-written module, available at -ssc-. And, it's harder to give a good answer to your question without seeing an example of your data. (The relevance of those two comments would be clarified for you from a re-reading of the StataList FAQ.)

    Those two comments aside, I can see that you are having trouble understanding some standard syntax diagram features that -help gsample- uses. One issue that is biting you is that square brackets in syntax diagrams indicate items that are optional, and are not actually part of the command. (An exception to this is when a weight is being specified. That's why Stata responded "weights not allowed.")

    In any event, I don't think there's any particular reason to use -gsample- here, and I don't know that there's any way to use it to achieve the departure from proportionality represented by having at least 1 school in each stratum. Here's what I would do, illustrated with some simulated data that's likely similar to yours. I've written this code to be understandable, so it's not particularly concise. Note that if the number of schools within municipality is small, you will get sample sizes that will not quite match your overall target of a sampling fraction of 144/295.


    Code:
    // Create some example data to work with.
    clear
    set seed 476432
    set obs 295
    gen int school = _n
    gen byte municipality = ceil(runiform() * 20)
    // You don't have "one stratum."  You have one stratifying variable, but many strata.
    tab municipality
    describe
    // end creating example.
    //
    // You would do the following:
    set seed 1234 // your favorite number, ensures reproducibility
    // Target sampling fraction of 144/295 in each municipality.
    local fraction = 144/295
    // School population and sample size for each municipality,
    bysort municipality: gen int popsize = _N
    gen sampsize = round(`fraction' * popsize)    
    replace sampsize = 1 if (sampsize < 1)
    //
    // Randomly shuffle schools within municipality to facilitate random selection.
    gen randorder = runiform()
    sort municipality randorder
    //
    // Pick schools within municipality
    by municipality: gen byte insample = (_n <=sampsize)
    //
    // Examine results
    browse insample municipality popsize sampsize school if (insample == 1)
    //  Now,  you can do -list .... if (insample ==1)

    Comment


    • #3
      Thank you so much Mike for your help and your pedagogical explanation, your code worked perfectly with my data. I guess my lack of knowledge made me use the wrong command in trying to use gsample. I will definitely catch up on my ability to read the syntax. Also, thanks for introducing me to new commands that are will make my work much easier, like local fraction.

      Thank you for helping me!

      Comment


      • #4
        Just to be clear: I didn't have to use a local macro to store the 144/295. One could just put that fraction right into the code. Saving a constant into a local is just a stylistic choice to make it more easily modifiable. So, you might have had:
        Code:
        gen sampsize = round((144/295)* popsize)

        Comment


        • #5
          Okey then I know, thanks for the clarification, it feels like the macro is easier to work with and can save a lot of time (maybe not in this particular case).

          Comment

          Working...
          X