Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • matching 5:1

    Hi all
    I realise this is a rather broad question, therefore would be grateful just for some pointers towards the correct command(s) if possible.
    I have a dataset of cases and non-cases. For every one case, I want to select 5 controls based on matched age, sex and calendar year.
    In other words, I want the percentage distribution of these three variables to be the same for both cases and controls, with ratio of 5:1 number of controls to cases.
    Which command(s) would you recommend for this?
    Any advice greatly appreciated with thanks.

  • #2
    So, in general terms, begin by breaking up your data set into one dataset of cases only and the other dataset of non-cases, Then

    Code:
    use cases
    joinby age sex year using non_cases
    will produce a massive data set where each case is paired with every possible matched non_case.

    Now shuffle the observations into random order (by generating a new double-precision variable, or two such if your data set is very large) with random numbers and sorting on it). Then -by case_id (random_variable), sort: keep if _n <= 5- leaves you with a five to one match. (Of course, if there are any cases for which fewer than five potential matches exist, they will only have as many matches are are available.)

    Now, this approach will allow the same non-case to end up matched with more than one case. There is nothing wrong with this from a sampling theory perspective, but some people are uncomfortable with it. It is possible to do the matching in a way that allocates each non-case to at most one case, but the code is somewhat more complicated.

    Comment

    Working...
    X