Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Risk set sampling

    Dear all,

    I am about to set up a matched case control data set using 1:4 matching with first hospital admission as the outcome variable and using risk set sampling.
    The code I've used previously for a different outcome (mortality) is:

    stset Study_OUT_date, failure(Failure_status) origin(time HF_index_date) scale(365.25)
    set seed 1768927689
    sttocc, match(Study_in_date) number(4)

    However I want to make sure that the controls selected for each case are not themselves a case (admitted to hospital) within three months of their match date as a control.

    I thought of creating an indicator variable with a different number for each 3 months of follow up that the cases occur (cases and controls are also matched on calendar time). My questions are i) can I instruct Stata to NOT match according to this variable (so that cases occuring within the same 3 months do not appear as cases and controls in the same set) and if so what is the command? ii) is there another solution?

    Many thanks
    Claire

  • #2
    You might find useful a post I made on some code to do incidence density sampling (same thing, right??) without replacement. (In that case, -stocc-, with which I am not familiar, would not work.) My general thought here would be to take the strategy I did there, to merge all the controls that could apply for a given case, and delete the one(s) you don't want.

    That discussion and posting is a bit obscure even to me now, but it might be a starting place.

    Regards, Mike

    Comment


    • #3
      Welcome to the new Statalist, Claire! See the FAQ section 12 for how to format Stata code and results with the new Forum editor.

      To solve your problem, run sttocc, then exclude controls who failed too close to the case. In the following example from the help for sttocc, there are 80 cases initially. I choose three days distance as the matching limit. As you can see, eight cases are dropped because there are no controls. As I've written this, I exclude all matched subjects, not just future failures, with observation times <= 3 days of the case's failure time. This puts all controls on the same basis.

      The advantage of the CC approach is that it can reproduce the hazard ratios of a full cohort analysis, but with less data and less confounding (if you match on additional variables). If I could do a full-cohort Cox analysis, or, to avoid the confounding problem, a randomized intervention, I'd never drop people from risk sets. What justifies it in your situation?


      Code:
      webuse diet, clear
      stset dox, failure(fail) enter(time doe) id(id) origin(time dob) scale(365.25)
      tempfile t1
      save `t1'
      
      set seed 87842418
      sttocc,  match(job) n(4) nodots
      merge m:1 id using `t1'
      keep if _merge==3
      gen  ccdif = _t - _time
      
      /* Following  keeps only matched subjects observed >3 days after the case failure
         and excludes those matched subjects observed <=3 days*/
      keep if _case==1  | ccdif>3
      /* This version excludes only matched future failures observed <= 3 days  */
      // keep if _case==1  | (fail!=0 & ccdif > 3) | fail==0
      
      bys _set: gen nset = _N
      drop if nset==1 // i.e. if  no controls
      codebook _set
      Last edited by Steve Samuels; 10 Apr 2015, 22:31.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment

      Working...
      X