Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Randomization-group

    Hi ,
    I'm trying to randomly assign villages into two groups, that have an unequal observation in each village. But I want to get as much as equal observation after randomization.

    Below is the sample data, I would like to divide villages into group within area

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str5 caseid str23 village_name str8 area byte vill_id float(count_n count_N)
    "1056" "Jan Mai Kaung"   "Kachin"  4 1 33
    "1034" "Lekone Ziun"     "Kachin"  3 1 22
    "1090" "Maina"           "Kachin"  5 1 44
    "1213" "Mannpya Sanpra"  "Kachin"  7 1 20
    "1014" "N'jang Dung"     "Kachin"  2 1 20
    "1142" "NgwiPyaw Sanpra" "Kachin"  6 1 68
    "1299" "Shata Pru"       "Kachin" 10 1 12
    "1249" "Shing Jai"       "Kachin"  9 1 44
    "1233" "Shwe Zet"        "Kachin"  8 1 16
    "1001" "Tatkone Sanpra"  "Kachin"  1 1 13
    end

  • #2
    Read this

    Comment


    • #3
      Will you really collect data from only 10 villages with 292 sample observations? Or are you showing only an extract of your potential data?

      Comment


      • #4
        The mechanism of randomizing groups (or individuals) is relatively simple. Here's the general idea for a 1:1 randomization of some individuals, which can be generalized to clusters if you have a unique dataset with cluster id.

        Code:
        set seed 17 // set this somewhere at the top of your do-file and change the seed number
        // later in your code ....
        gen byte group = rbinomial(1, 0.5)
        However, this doesn't apply any constraints to the apparent balance of total people within villages. It seems like you are trying to design a cluster-randomized (or group-randomized) experiment, but I can't think of any good reason to balance people across groups. If this is so, then perhaps you can give more details, as there are more important factors to consider than group size. If this is not the case, then this post wasn't very helpful to you, but you may want to explain what you are trying to do with those groups.

        By the way, a naive and direct attack to your question, as stated could be something like this below. Begin at the Begin Here.

        Code:
        // make up some fake data to show a technique
        set seed 17
        set obs 10
        gen int village_size = ceil(exp(rnormal(7, 1.1)))
        
        // Begin here
        gsort -village_size
        gen byte group = mod(_n-1, 2)
        tabstat village_size, c(s) s(n sum) by(group)
        Result

        Code:
        . tabstat village_size, c(s) s(n sum) by(group)
        
        Summary for variables: village_size
        Group variable: group
        
           group |         N       Sum
        ---------+--------------------
               0 |         5     14150
               1 |         5      9241
        ---------+--------------------
           Total |        10     23391
        ------------------------------
        You will see that balance, as best as can be described by total group size, has been achieved but there is still a huge difference between group sizes. Moreover, this isn't a truly randomized result because it's constrained to follow with village size.

        Comment

        Working...
        X