Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching cases and controls 1:4

    Hello,

    Can someone please help me to match cases and controls on stata please? I have 4 controls for every 1 case and would like to match them by age. Please let me know how i can do this. Thank you in advance

  • #2
    I doubt anybody can provide you with a useful answer to this very general question unless you provide example data to work with. Be sure to use the -dataex- command to do that. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      I'm completely sympathetic to Clyde's points, but by chance I was interested enough today to mock up some data that may be similar to what you have.

      For your problem, I'd recommend using one of the many community-contributed programs for matching available at SSC. Such programs are typically designed for matching in the context of evaluating a treatment effect, which makes the terminology of their documentation confusing for case control studies, but does not make them unuseful. Some of the relevant programs would include -kmatch-, -ultimatch-, and -calipmatch-. In using one of these programs, you would find that you need to make many decisions about how you want the matching to be done, i.e. nearest neighbor, with or without replacement, etc.

      From your description, I *guessed* that you have 4 times as many potentially available controls as cases, but it will in general not be possible to optimally assign that many controls to each case, so I've illustrated code for selecting just 3 controls for each case. presuming that 4 potential controls are available. Finally, I'd note that questions involving matching cases and controls is a common question on this list, so if you search statalist.org for such postings, you may find approaches you like better than the following.

      Code:
      // Create some example data to work with.
      clear
      set seed 9876
      local ncase = 3
      local ctl_avail= 4 * `ncase'
      local ctl_want = 3  // picking 3 controls per case
      set obs `=`ncase' + `ctl_avail''
      gen int id = _n
      gen byte case = _n <= `ncase'
      gen byte age = 80 * runiform()
      forval i = 1/3 {
         gen int othervar`i' = ceil(runiform() * 100)
      }
      // end of creating example data
      //
      //  Keep original data to merge back later.
      tempfile origdata
      save `origdata' 
      // Work with just the relevant vars
      keep id case age
      //
      //  I chose to use -kmatch- for the matching, available at SSC.
      //
      // Match each case to `ctl_want' controls on a nearest neighbor basis, without
      // replacement. // Create variables "controlid1, controlid2, etc. which, for each
      // case, will hold the id numbers of the controls matched to that case.
      kmatch md case age, nn(`ctl_want') wor idgenerate(controlid) idvar(id)
      // Using kmatch is easy,  What's harder, in my view, is to make from this
      // a file with each case or control as an observation, with matched
      // sets identified. *Perhaps* someone will suggest a nicer way to do this.
      list //  examine in wide format to see what we have
      // Drop irrelevant stuff
      drop age
      drop if case == 0
      // -reshape- will create a set of ctl_want' control observations for each case.
      reshape long controlid, i(id ) j(cnum)
      replace case = 0
      // We need one case observation to go with each set of controls.
      expand 2 if cnum == `ctl_want'
      bysort id (cnum):  replace case = 1 if _n == (`ctl_want' + 1)
      // Correct the id variables
      clonevar idset = id
      replace id = controlid if case == 0
      drop controlid
      label var id "id of observation"
      label var idset "id for matched set"
      // Reconnect with original data
      merge 1:1 id using `origdata',  keepusing(age other*) keep(match)
      // Inspect
      sort idset case
      order idset case id age
      list

      Comment

      Working...
      X