No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Subsampling after pairwise matching with clustered-random sample

    Research-oriented question:

    I have a database of 1,500 observations, divided into treatment and control using pairwise matching at the cluster level (60 clusters: 30 treatment, 30 control). Clusters were paired using the ssc mahascores command, then each cluster pair was randomly assigned to treatment and control. The covariates used to determine pairnig are from geographic data that is independent of the observation-level data.

    The 1,500 observations I have were randomly-selected by cluster from a population of 8,5000 using runiform().

    I now have to select a subsample that will complete a qualitative instrument. Due to feasibility, we will only be able to complete the instrument with 150-200 individuals. The most important concern is balance between treatment and control, but ideally I want this subsample to be representative of the larger sample as well. With 150-200 observations, we are assuming we won't be able to achieve power for the anticipated effect size in the treatment group, so while important, maximizing power is not the most pressing concern.

    Options I've considered:
    1. Clustered random sampling using the same process I used to select my sample from the population. The problem is that, with about 3 observations per cluster, I doubt this subsample will be truly random
    2. Random sampling without clustering. I don't think this is valid, as both sampling from the population and assignment to treatment/control were clustered.
    3. Randomly sample by cluster from the treatment group, then match with control group observations using ssc mahapick command
    Option 3 seems to be the most valid to me, but there seems to be a lack of literature on the subject. Specifically, I would be using different covariates (demographic data, principallly) from the ones used in assigning treatment and control (population and crime data).

    Has anyone used a method like this, and are there any papers I should be reading? Are there any other options I haven't considered here?

    Last edited by Daniel Jensen; 11 Dec 2017, 13:18.

  • #2
    One option you haven't mentioned, and the one that would seem most natural to me is to randomly select from the treatment group, and then for the controls, rather than re-running a matching procedure (your #3) use exactly the same matches that you already have.