Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dataset structure for cluster sampling

    i want to select 800 clusters from a listing of 2000 enumeration areas. my sampling strategy is systematic random cluster sampling with PPES. how must i structure my dataset for STATA?

  • #2
    You'll increase the chances of a useful answer by following the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. Also, assume we're not from your area - I am not sure what PPES means.

    It is almost always best with such panel data to use the long form. However, if you have the data in wide form, you could easily sample by cluster. If you look back over the last week or so, you'll find Nick Cox provided a neat way to do this. If your data are wide, you could generate a random number, sort on that number, and then take the first 800 observations.

    I don't know an efficient way to sample on the clusters if the data are in long form. I guess you could save the data, drop observations so you only keep one observation per cluster, apply the method I noted above, then merge it back with the original data set. Something like:

    save originaldata,replace
    by cluster, sort: drop if _n == 1
    g ran=runiform()
    sort ran
    keep in 1/800
    merge 1:m cluster using originaldata
    keep if _merge==3
    drop _merge

    I might have the 1:m wrong - it might be m:1.

    Comment


    • #3
      The user-contributed -gsample- available at SSC has a -cluster- option, and is perhaps relevant here.

      Comment

      Working...
      X