dataset structure for cluster sampling

Portia Mutevedzi

Join Date: Feb 2019

Posts: 1
#1

dataset structure for cluster sampling

10 Feb 2019, 05:59

i want to select 800 clusters from a listing of 2000 enumeration areas. my sampling strategy is systematic random cluster sampling with PPES. how must i structure my dataset for STATA?
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

11 Feb 2019, 11:26

You'll increase the chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also, assume we're not from your area - I am not sure what PPES means.

It is almost always best with such panel data to use the long form. However, if you have the data in wide form, you could easily sample by cluster. If you look back over the last week or so, you'll find Nick Cox provided a neat way to do this. If your data are wide, you could generate a random number, sort on that number, and then take the first 800 observations.

I don't know an efficient way to sample on the clusters if the data are in long form. I guess you could save the data, drop observations so you only keep one observation per cluster, apply the method I noted above, then merge it back with the original data set. Something like:

save originaldata,replace
by cluster, sort: drop if _n == 1
g ran=runiform()
sort ran
keep in 1/800
merge 1:m cluster using originaldata
keep if _merge==3
drop _merge

I might have the 1:m wrong - it might be m:1.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2423
#3

11 Feb 2019, 11:50

The user-contributed -gsample- available at SSC has a -cluster- option, and is perhaps relevant here.
1 like
Comment

Announcement

dataset structure for cluster sampling

Comment

Comment