Risk set sampling

Claire Rushton

Join Date: Apr 2015

Posts: 1
#1

Risk set sampling

10 Apr 2015, 07:27

Dear all,

I am about to set up a matched case control data set using 1:4 matching with first hospital admission as the outcome variable and using risk set sampling.
The code I've used previously for a different outcome (mortality) is:

stset Study_OUT_date, failure(Failure_status) origin(time HF_index_date) scale(365.25)
set seed 1768927689
sttocc, match(Study_in_date) number(4)

However I want to make sure that the controls selected for each case are not themselves a case (admitted to hospital) within three months of their match date as a control.

I thought of creating an indicator variable with a different number for each 3 months of follow up that the cases occur (cases and controls are also matched on calendar time). My questions are i) can I instruct Stata to NOT match according to this variable (so that cases occuring within the same 3 months do not appear as cases and controls in the same set) and if so what is the command? ii) is there another solution?

Many thanks
Claire
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2424
#2

10 Apr 2015, 21:14

You might find useful a post I made on some code to do incidence density sampling (same thing, right??) without replacement. (In that case, -stocc-, with which I am not familiar, would not work.) My general thought here would be to take the strategy I did there, to merge all the controls that could apply for a given case, and delete the one(s) you don't want.

That discussion and posting is a bit obscure even to me now, but it might be a starting place.

Regards, Mike
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

10 Apr 2015, 21:35

Welcome to the new Statalist, Claire! See the FAQ section 12 for how to format Stata code and results with the new Forum editor.

To solve your problem, run sttocc, then exclude controls who failed too close to the case. In the following example from the help for sttocc, there are 80 cases initially. I choose three days distance as the matching limit. As you can see, eight cases are dropped because there are no controls. As I've written this, I exclude all matched subjects, not just future failures, with observation times <= 3 days of the case's failure time. This puts all controls on the same basis.

The advantage of the CC approach is that it can reproduce the hazard ratios of a full cohort analysis, but with less data and less confounding (if you match on additional variables). If I could do a full-cohort Cox analysis, or, to avoid the confounding problem, a randomized intervention, I'd never drop people from risk sets. What justifies it in your situation?

Code:

webuse diet, clear stset dox, failure(fail) enter(time doe) id(id) origin(time dob) scale(365.25) tempfile t1 save `t1' set seed 87842418 sttocc, match(job) n(4) nodots merge m:1 id using `t1' keep if _merge==3 gen ccdif = _t - _time /* Following keeps only matched subjects observed >3 days after the case failure and excludes those matched subjects observed <=3 days*/ keep if _case==1 | ccdif>3 /* This version excludes only matched future failures observed <= 3 days */ // keep if _case==1 | (fail!=0 & ccdif > 3) | fail==0 bys _set: gen nset = _N drop if nset==1 // i.e. if no controls codebook _set

Last edited by Steve Samuels; 10 Apr 2015, 22:31.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Comment

Comment