Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample clusters without replacement

    Hi there,

    I have the following problem:

    Because my dataset is too big (>20gigs, my computer crashes when trying to work with it despite 16gigs ram) I wanna take a sample and try to do some stuff with that sample.
    Problem is that for the sample to be useful I have to sample clusters... that is the data has more than 150,000,000 or so observations but only like 30,000 or so different values that a certain categorial variable takes. I wanna sample like 8,000 or so of these 30,000 and then have all observations for which the variable takes one of the 8,000 values.... so that overall I have roughly a 25% sample but all the clusters are still complete.

    Problem is that there is a cluster option only for bsample which is sampling with replacement... for my purposes here I obviously don't wanna have replacement... but for sample (the function w/o replacement) there is no cluster option...

    Do you wise people here have an idea of how I can solve that problem?

    Additional complication: The whole dataset is split up now into several dta files each containing some of the observations - but each single cluster is NOT contained in a single dta unfortunately.

    Best,

    Jakob
    Last edited by Jakob; 23 Jul 2014, 13:59.

  • #2
    Always good to check for FAQs; earlier discussions on this topic:

    http://www.stata.com/support/faqs/da...ling-clusters/
    http://www.stata.com/statalist/archi.../msg00572.html

    As for the database issue, I'd think about running on a server if possible, or off the top of my head something like this might be helpful:

    http://www.stata.com/support/faqs/da...t-to-database/
    Last edited by Andrew Lover; 23 Jul 2014, 21:36.
    __________________________________________________ __
    Assistant Professor, Department of Biostatistics and Epidemiology
    School of Public Health and Health Sciences
    University of Massachusetts- Amherst

    Comment


    • #3
      Thanks a lot for your help, I'll check the FAQ first next time!

      Comment

      Working...
      X