Matching without replacement from a file of pairs for case-control and other applications

Mike Lacy

Join Date: Apr 2014

Posts: 2426
#1

Matching without replacement from a file of pairs for case-control and other applications

30 Jun 2022, 16:10

Short version: I’m seeking a solution to how to do 1:m matching of cases and controls without replacement, *given a file in long (“edge”) format of all possible matching pairs.* Code to create sample data occurs at the end of this post.

Longer version:
In corresponding off-list with Rima Saliba about this problem, I discovered, that the code I had posted here several years ago in response to a question about this problem is wrong.

While there have been a number of threads over the year about case control matching, (see e.g. here). I now have encountered and refined the problem in a way that seems different enough from previous work to be worth posting in a more generalized way. What follows is my refined version of the problem, to which I'm interested in solutions.

Via the use of -joinby- or perhaps -rangejoin- or -cross-, one can have a file that pairs up cases or treatment subjects with matching potential controls. In such situations, analysts may want to have 1:m matching *without replacement.* Complications include that:
1) The controls that match one case/treatment subject may match many others.
2) There are varying numbers of controls available for each case, in general more or perhaps less than m
3) If possible, one wants to avoid too “greedy” an algorithm, which can result in the extreme in one case getting assigned all m controls and some other similar case getting 0.

I have the idea that some solution involving -merge- should be possible, per some earlier threads, but I have not successfully figured how to do that. I also have the thought that one of the many built-in or community-contributed matching commands might be used, but I have not worked that out either. I *have* discovered, that some of the “greediness” problem can be avoided by having an algorithm that picks only *1* control without replacement for every case, and then applying this iteratively, so a solution that only picks one control per case would solve the problem.

In that context, here is a code snippet to create what I’d consider a representative kind of data set with which to work:

Code:

// Create an “edge” file of matched pairs. set seed 82743 local ncases = 100 local maxmatch = 100 local maxcontrolid = 2000 clear set obs `ncases' gen int caseid = _n gen navail = ceil(runiform() * `maxmatch') label var navail "# controls matched to this case" expand navail gen int controlid = ceil(runiform() * `maxcontrolid') summ navail order caseid controlid

I realize that “without replacement“ is not necessarily analytically preferable, but that’s another issue.
Tags: None

Announcement

Matching without replacement from a file of pairs for case-control and other applications