Sample clusters without replacement

Jakob

Join Date: Jul 2014

Posts: 2
#1

Sample clusters without replacement

23 Jul 2014, 13:56

Hi there,

I have the following problem:

Because my dataset is too big (>20gigs, my computer crashes when trying to work with it despite 16gigs ram) I wanna take a sample and try to do some stuff with that sample.
Problem is that for the sample to be useful I have to sample clusters... that is the data has more than 150,000,000 or so observations but only like 30,000 or so different values that a certain categorial variable takes. I wanna sample like 8,000 or so of these 30,000 and then have all observations for which the variable takes one of the 8,000 values.... so that overall I have roughly a 25% sample but all the clusters are still complete.

Problem is that there is a cluster option only for bsample which is sampling with replacement... for my purposes here I obviously don't wanna have replacement... but for sample (the function w/o replacement) there is no cluster option...

Do you wise people here have an idea of how I can solve that problem?

Additional complication: The whole dataset is split up now into several dta files each containing some of the observations - but each single cluster is NOT contained in a single dta unfortunately.

Best,

Jakob

Last edited by Jakob; 23 Jul 2014, 13:59.
Tags: None
Andrew Lover

Join Date: Apr 2014

Posts: 182
#2

23 Jul 2014, 21:12

Always good to check for FAQs; earlier discussions on this topic:

http://www.stata.com/support/faqs/da...ling-clusters/
http://www.stata.com/statalist/archi.../msg00572.html

As for the database issue, I'd think about running on a server if possible, or off the top of my head something like this might be helpful:

http://www.stata.com/support/faqs/da...t-to-database/

Last edited by Andrew Lover; 23 Jul 2014, 21:36.

__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
Comment
Jakob

Join Date: Jul 2014

Posts: 2
#3

24 Jul 2014, 10:59

Thanks a lot for your help, I'll check the FAQ first next time!
Comment

Announcement

Sample clusters without replacement

Comment

Comment