Research-oriented question:
I have a database of 1,500 observations, divided into treatment and control using pairwise matching at the cluster level (60 clusters: 30 treatment, 30 control). Clusters were paired using the ssc mahascores command, then each cluster pair was randomly assigned to treatment and control. The covariates used to determine pairnig are from geographic data that is independent of the observation-level data.
The 1,500 observations I have were randomly-selected by cluster from a population of 8,5000 using runiform().
I now have to select a subsample that will complete a qualitative instrument. Due to feasibility, we will only be able to complete the instrument with 150-200 individuals. The most important concern is balance between treatment and control, but ideally I want this subsample to be representative of the larger sample as well. With 150-200 observations, we are assuming we won't be able to achieve power for the anticipated effect size in the treatment group, so while important, maximizing power is not the most pressing concern.
Options I've considered:
Has anyone used a method like this, and are there any papers I should be reading? Are there any other options I haven't considered here?
Thanks!
I have a database of 1,500 observations, divided into treatment and control using pairwise matching at the cluster level (60 clusters: 30 treatment, 30 control). Clusters were paired using the ssc mahascores command, then each cluster pair was randomly assigned to treatment and control. The covariates used to determine pairnig are from geographic data that is independent of the observation-level data.
The 1,500 observations I have were randomly-selected by cluster from a population of 8,5000 using runiform().
I now have to select a subsample that will complete a qualitative instrument. Due to feasibility, we will only be able to complete the instrument with 150-200 individuals. The most important concern is balance between treatment and control, but ideally I want this subsample to be representative of the larger sample as well. With 150-200 observations, we are assuming we won't be able to achieve power for the anticipated effect size in the treatment group, so while important, maximizing power is not the most pressing concern.
Options I've considered:
- Clustered random sampling using the same process I used to select my sample from the population. The problem is that, with about 3 observations per cluster, I doubt this subsample will be truly random
- Random sampling without clustering. I don't think this is valid, as both sampling from the population and assignment to treatment/control were clustered.
- Randomly sample by cluster from the treatment group, then match with control group observations using ssc mahapick command
Has anyone used a method like this, and are there any papers I should be reading? Are there any other options I haven't considered here?
Thanks!
Comment