Random date assignment in comparator study

Matthew Milano

Join Date: Apr 2021
Posts: 37

Random date assignment in comparator study

12 May 2022, 12:15

I'm using a large electronic health record database to run a cohort study comparing users of drug X with unexposed (non-drug X users). I have the index date (start of follow up) for drug X users and want to assign start dates for follow up at random to the unexposed group by incidence density sampling from the distribution of index dates in the drug X cohort.

I have no idea how to do this.

Sample of drug X cohort:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte ID int dob byte sex str1 drug int(index_date index_year enddate)
 1   547 2 "X" 16755 2005 20583
 2 -1279 2 "X" 18207 2009 21184
 3 -1645 2 "X" 16810 2006 20941
 4 -2010 2 "X" 21031 2017 21184
 5 -2375 2 "X" 19537 2013 19990
 6 -2010 2 "X" 16149 2004 16580
 7 -7854 2 "X" 14628 2000 21184
 8 -2010 2 "X" 18219 2009 18238
 9 -8584 2 "X" 14810 2000 21184
10   912 2 "X" 19885 2014 20039
11   547 2 "X" 17288 2007 21184
12 -7854 2 "X" 15041 2001 19775
13   912 2 "X" 18114 2009 21184
14 -8219 2 "X" 17713 2008 18536
15 -3471 2 "X" 17143 2006 18373
end
format %td dob
format %td index_date
format %td enddate
label values sex sex
label def sex 2 "M", modify

Sample of unexposed cohort:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte ID int dob byte sex int enddate
 1  -4567 2 22585
 2  -4932 2 17055
 3  -1279 2 20947
 4  -7123 2 19508
 5 -11141 2 18478
 6  -8219 2 20396
 7 -11141 2 20947
 8 -10776 2 21678
 9  -8584 2 14768
10 -13698 2 16917
11  -1645 2 20852
12  -8950 2 19674
13  -5297 2 19504
14  -6758 2 20717
15  -8584 2 14976
16  -4567 2 20710
17 -13333 2 19046
18  -4201 2 21141
19   -184 2 18141
20  -6028 2 20264
21  -1279 2 18669
22  -2375 2 22586
23 -12602 2 21490
24   -184 2 15749
25  -6393 2 22586
26  -6758 2 19969
27 -12602 2 16148
28 -14063 2 15813
29  -2375 2 22316
30   -549 2 22586
31  -2375 2 15985
32  -7854 2 20718
33  -1645 2 21132
34   -914 2 19874
35  -1279 2 21338
36  -3471 2 16881
37   -184 2 20816
38 -14063 2 16020
39  -7123 2 20999
40  -7489 2 16544
41  -6028 2 21155
42  -1279 2 22586
43 -11872 2 19814
44  -6758 2 17898
45  -9680 2 20216
46   -184 2 16071
47  -2010 2 22585
48  -4567 2 20216
49  -1279 2 22585
50  -1645 2 20397
end
format %td dob
format %td enddate
label values sex sex2
label def sex2 2 "M", modify

Tags: big data, cohort study, incidence density sample, non-user comparator

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17712

13 May 2022, 03:12

Matthew:
I would follow this approach:
1) calculating the therapy duration, its mean and standard error in the treated dataset:

Code:

. g duration= enddate-index_date

. mean duration

Mean estimation                             Number of obs = 15

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
    duration |     2588.6   590.8859      1321.276    3855.924
--------------------------------------------------------------

.

2) using what above to fit a a random gamma distribution to therapy duration for unexposed individuals (assuming that therapy duration follows this kind of distribution):

Code:

. g distribution=(590.8859^2/2588.6)*invgammap((2588.6^2/590.8859^2),runiform())

Kind regards,
Carlo
(Stata 19.0)

Comment

Joseph Coveney

Join Date: Apr 2014

Posts: 4420
#3

13 May 2022, 05:48

Originally posted by Matthew Milano View Post

I . . . want to assign start dates for follow up at random to the unexposed group by incidence density sampling from the distribution of index dates in the drug X cohort.

If I wanted to mimic an empirical distribution, then I would probably use bootstrap-like resampling (that is, with replacement).

You can use bsample for this. There's a little fiddling needed to draw a sample from your exposed group to match the size of your unexposed group: see below for an illustration of one way to accomplish this (begin at the "Begin here" comment).

.ÿ
.ÿversionÿ17.0

.ÿ
.ÿclearÿ*

.ÿ
.ÿ//ÿseedem
.ÿsetÿseedÿ661633865

.ÿ
.ÿquietlyÿinputÿbyteÿIDÿintÿdobÿbyteÿsexÿstr1ÿdrugÿ///
>ÿÿÿÿÿÿÿÿÿint(index_dateÿindex_yearÿenddate)

.ÿ
.ÿformatÿdobÿ*dateÿ%tdCY-N-D

.ÿ
.ÿlabelÿdefineÿSexesÿ1ÿFÿ2ÿM

.ÿlabelÿvaluesÿsexÿSexes

.ÿ
.ÿtempfileÿexposed

.ÿquietlyÿsaveÿ`exposed'

.ÿ
.ÿdropÿ_allÿ

.ÿquietlyÿinputÿbyteÿIDÿintÿdobÿbyteÿsexÿintÿenddate

.ÿ
.ÿformatÿdobÿ*dateÿ%tdCY-N-D

.ÿ
.ÿlabelÿvaluesÿsexÿSexes

.ÿ
.ÿtempfileÿunexposed

.ÿquietlyÿsaveÿ`unexposed'

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿ//ÿGetÿsampleÿsizeÿofÿunexposedÿpatientÿpopulation
.ÿquietlyÿcount

.ÿlocalÿtarget_Nÿ=ÿr(N)

.ÿ
.ÿ//ÿCreateÿaÿlargeÿdatasetÿofÿrandomlyÿsampledÿexposedÿpatients'ÿfollow-upÿelapsedÿdays
.ÿuseÿ`exposed'

.ÿgenerateÿintÿdeltaÿ=ÿenddateÿ-ÿindex_date

.ÿkeepÿdelta

.ÿ
.ÿpreserve

.ÿ
.ÿbsampleÿ//ÿ,ÿstrata(sex)ÿ

.ÿtempfileÿsampled_elapsed_days

.ÿquietlyÿsaveÿ`sampled_elapsed_days'

.ÿ
.ÿwhileÿ_Nÿ<ÿ`target_N'ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿrestore
ÿÿ3.ÿÿÿÿÿÿÿÿÿpreserve
ÿÿ4.ÿ
.ÿÿÿÿÿÿÿÿÿbsampleÿ//,ÿstrata(sex)
ÿÿ5.ÿÿÿÿÿÿÿÿÿappendÿusingÿ`sampled_elapsed_days'
ÿÿ6.ÿÿÿÿÿÿÿÿÿquietlyÿsaveÿ`sampled_elapsed_days',ÿreplace
ÿÿ7.ÿ}

.ÿrestoreÿ,ÿnot

.ÿquietlyÿkeepÿinÿ1/`target_N'

.ÿgenerateÿlongÿrowÿ=ÿ_n

.ÿquietlyÿsaveÿ`sampled_elapsed_days',ÿreplace

.ÿ
.ÿ//ÿApplyÿtheÿsampledÿfollow-upÿelapsedÿdaysÿtoÿunexposedÿpatients
.ÿuseÿ`unexposed',ÿclear

.ÿgenerateÿlongÿrowÿ=ÿ_n

.ÿmergeÿ1:1ÿrowÿusingÿ`sampled_elapsed_days',ÿassert(match)ÿnogenerateÿnoreport

.ÿdropÿrow

.ÿ
.ÿgenerateÿintÿstart_dateÿ=ÿenddateÿ-ÿdelta

.ÿformatÿ*dateÿ%tdCY-N-D

.ÿ
.ÿlistÿinÿ1/5,ÿnoobs

ÿÿ+---------------------------------------------------------+
ÿÿ|ÿIDÿÿÿÿÿÿÿÿÿÿdobÿÿÿsexÿÿÿÿÿÿenddateÿÿÿdeltaÿÿÿstart_dateÿ|
ÿÿ|---------------------------------------------------------|
ÿÿ|ÿÿ1ÿÿÿ1947-07-01ÿÿÿÿÿMÿÿÿ2021-11-01ÿÿÿÿÿ153ÿÿÿ2021-06-01ÿ|
ÿÿ|ÿÿ2ÿÿÿ1946-07-01ÿÿÿÿÿMÿÿÿ2006-09-11ÿÿÿÿ3828ÿÿÿ1996-03-19ÿ|
ÿÿ|ÿÿ3ÿÿÿ1956-07-01ÿÿÿÿÿMÿÿÿ2017-05-08ÿÿÿÿ6556ÿÿÿ1999-05-27ÿ|
ÿÿ|ÿÿ4ÿÿÿ1940-07-01ÿÿÿÿÿMÿÿÿ2013-05-30ÿÿÿÿ2977ÿÿÿ2005-04-05ÿ|
ÿÿ|ÿÿ5ÿÿÿ1929-07-01ÿÿÿÿÿMÿÿÿ2010-08-04ÿÿÿÿÿ453ÿÿÿ2009-05-08ÿ|
ÿÿ+---------------------------------------------------------+

.ÿ
.ÿexit

endÿofÿdo-file

.

Code:

help bsample
1 like
Comment
Dana MS

Join Date: Dec 2022

Posts: 2
#4

02 Dec 2022, 06:39

Carlo Lazzaro Many thanks for posting the code.

I tried your code and have the distribution for both groups but how do I assign the start of followup for unexposed group based on the distribution? In other words what is the next step
Comment

Announcement

Random date assignment in comparator study

Comment

Comment

Comment