Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question: how to randomly choose one case from each of 72 groups?

    Hey colleagues,

    The dataset consists of five key variables from var1 to var5. The five variables are either dichotomous (0/1) or trichotomous (0/1/2). Their combinations divide the observations into 2*2*2*3*3=72 groups. For instance, a group is made up of observations whose var1 is 1, var2 is 2, var3 is 0, var4 1 and var 5 0. My question is how we can randomly select exactly one observation from each group. Do-Loop seems like a must but I have no idea how to do it.

    Variable list:
    .id
    .var1: a continuous variable that can be triply grouped: 0/1/2
    .var2: an ordinal variable: 0/1/2
    .var3: a dummy variable: 0/1
    .var4: a dummy variable: 0/1
    .var5: a dummy variable: 0/1

    Looking forward to reply and many thanks!

    Sincerely
    Raymon Lucas

  • #2
    Code:
    egen group = group(var1 var2 var3 var4 var5), label 
    * choose your own seed
    set seed 314159 
    gen double rnd = runiform()
    bysort group (rnd) : gen selected = _n == 1
    Closer scrutiny of the code shows that the group variable isn't strictly needed, but it seems likely to be useful any way. No loops needed, except those Stata runs on your behalf,

    Comment


    • #3
      if you want to do it repeatedly, then drop in the loop after set seed, and you'll either have to
      Code:
      capture drop rnd selected
      or use tempvars.

      Comment


      • #4
        Hey Nick and George,

        Many thanks for your help and sorry for late response! I have just come to the end of the summer vacation.

        Your replies are truly helpful! The commands do work!

        I have a follow-up question. My dataset consists of 17 groups according to a variable, say, voter’s age. More important, the size of the groups is considerably varying. In group 1, for instance, there are 1920 obs. Conversely, there are merely 32 in group 7.

        I had thought of randomly selecting 1 obs. from each group. In this way, I will amass a representative sample. However, given the varying sizes of each group, should I draw different numbers of obs. from each group? If the answer is YES, how should I do it using Stata?

        Thank you again!

        Comment


        • #5
          I would turn your question around. You're talking about a stratified sample. It's best to read up on stratified sampling, work out what is best for your problem, and then pose a question about Stata code. Otherwise put, what you might do and how you would do cover such a wide range that a concise reply is hardly possible.

          Comment

          Working...
          X