Randomization-group

Thein Zaw

Join Date: Nov 2016
Posts: 75

Randomization-group

20 May 2022, 02:37

Hi ,
I'm trying to randomly assign villages into two groups, that have an unequal observation in each village. But I want to get as much as equal observation after randomization.

Below is the sample data, I would like to divide villages into group within area

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str5 caseid str23 village_name str8 area byte vill_id float(count_n count_N)
"1056" "Jan Mai Kaung"   "Kachin"  4 1 33
"1034" "Lekone Ziun"     "Kachin"  3 1 22
"1090" "Maina"           "Kachin"  5 1 44
"1213" "Mannpya Sanpra"  "Kachin"  7 1 20
"1014" "N'jang Dung"     "Kachin"  2 1 20
"1142" "NgwiPyaw Sanpra" "Kachin"  6 1 68
"1299" "Shata Pru"       "Kachin" 10 1 12
"1249" "Shing Jai"       "Kachin"  9 1 44
"1233" "Shwe Zet"        "Kachin"  8 1 16
"1001" "Tatkone Sanpra"  "Kachin"  1 1 13
end

Tags: random

Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#2

20 May 2022, 06:19

Read this
1 like
Comment
Mead Over

Join Date: Sep 2014

Posts: 110
#3

20 May 2022, 06:46

Will you really collect data from only 10 villages with 292 sample observations? Or are you showing only an extract of your potential data?
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2398
#4

20 May 2022, 07:42

The mechanism of randomizing groups (or individuals) is relatively simple. Here's the general idea for a 1:1 randomization of some individuals, which can be generalized to clusters if you have a unique dataset with cluster id.

Code:

set seed 17 // set this somewhere at the top of your do-file and change the seed number // later in your code .... gen byte group = rbinomial(1, 0.5)

However, this doesn't apply any constraints to the apparent balance of total people within villages. It seems like you are trying to design a cluster-randomized (or group-randomized) experiment, but I can't think of any good reason to balance people across groups. If this is so, then perhaps you can give more details, as there are more important factors to consider than group size. If this is not the case, then this post wasn't very helpful to you, but you may want to explain what you are trying to do with those groups.

By the way, a naive and direct attack to your question, as stated could be something like this below. Begin at the Begin Here.

Code:

// make up some fake data to show a technique set seed 17 set obs 10 gen int village_size = ceil(exp(rnormal(7, 1.1))) // Begin here gsort -village_size gen byte group = mod(_n-1, 2) tabstat village_size, c(s) s(n sum) by(group)

Result

Code:

. tabstat village_size, c(s) s(n sum) by(group) Summary for variables: village_size Group variable: group group | N Sum ---------+-------------------- 0 | 5 14150 1 | 5 9241 ---------+-------------------- Total | 10 23391 ------------------------------

You will see that balance, as best as can be described by total group size, has been achieved but there is still a huge difference between group sizes. Moreover, this isn't a truly randomized result because it's constrained to follow with village size.
Comment

Announcement

Randomization-group

Comment

Comment

Comment