Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Random date assignment in comparator study

    I'm using a large electronic health record database to run a cohort study comparing users of drug X with unexposed (non-drug X users). I have the index date (start of follow up) for drug X users and want to assign start dates for follow up at random to the unexposed group by incidence density sampling from the distribution of index dates in the drug X cohort.

    I have no idea how to do this.

    Sample of drug X cohort:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte ID int dob byte sex str1 drug int(index_date index_year enddate)
     1   547 2 "X" 16755 2005 20583
     2 -1279 2 "X" 18207 2009 21184
     3 -1645 2 "X" 16810 2006 20941
     4 -2010 2 "X" 21031 2017 21184
     5 -2375 2 "X" 19537 2013 19990
     6 -2010 2 "X" 16149 2004 16580
     7 -7854 2 "X" 14628 2000 21184
     8 -2010 2 "X" 18219 2009 18238
     9 -8584 2 "X" 14810 2000 21184
    10   912 2 "X" 19885 2014 20039
    11   547 2 "X" 17288 2007 21184
    12 -7854 2 "X" 15041 2001 19775
    13   912 2 "X" 18114 2009 21184
    14 -8219 2 "X" 17713 2008 18536
    15 -3471 2 "X" 17143 2006 18373
    end
    format %td dob
    format %td index_date
    format %td enddate
    label values sex sex
    label def sex 2 "M", modify
    Sample of unexposed cohort:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte ID int dob byte sex int enddate
     1  -4567 2 22585
     2  -4932 2 17055
     3  -1279 2 20947
     4  -7123 2 19508
     5 -11141 2 18478
     6  -8219 2 20396
     7 -11141 2 20947
     8 -10776 2 21678
     9  -8584 2 14768
    10 -13698 2 16917
    11  -1645 2 20852
    12  -8950 2 19674
    13  -5297 2 19504
    14  -6758 2 20717
    15  -8584 2 14976
    16  -4567 2 20710
    17 -13333 2 19046
    18  -4201 2 21141
    19   -184 2 18141
    20  -6028 2 20264
    21  -1279 2 18669
    22  -2375 2 22586
    23 -12602 2 21490
    24   -184 2 15749
    25  -6393 2 22586
    26  -6758 2 19969
    27 -12602 2 16148
    28 -14063 2 15813
    29  -2375 2 22316
    30   -549 2 22586
    31  -2375 2 15985
    32  -7854 2 20718
    33  -1645 2 21132
    34   -914 2 19874
    35  -1279 2 21338
    36  -3471 2 16881
    37   -184 2 20816
    38 -14063 2 16020
    39  -7123 2 20999
    40  -7489 2 16544
    41  -6028 2 21155
    42  -1279 2 22586
    43 -11872 2 19814
    44  -6758 2 17898
    45  -9680 2 20216
    46   -184 2 16071
    47  -2010 2 22585
    48  -4567 2 20216
    49  -1279 2 22585
    50  -1645 2 20397
    end
    format %td dob
    format %td enddate
    label values sex sex2
    label def sex2 2 "M", modify

  • #2
    Matthew:
    I would follow this approach:
    1) calculating the therapy duration, its mean and standard error in the treated dataset:
    Code:
    . g duration= enddate-index_date
    
    . mean duration
    
    Mean estimation                             Number of obs = 15
    
    --------------------------------------------------------------
                 |       Mean   Std. err.     [95% conf. interval]
    -------------+------------------------------------------------
        duration |     2588.6   590.8859      1321.276    3855.924
    --------------------------------------------------------------
    
    .
    2) using what above to fit a a random gamma distribution to therapy duration for unexposed individuals (assuming that therapy duration follows this kind of distribution):
    Code:
    . g distribution=(590.8859^2/2588.6)*invgammap((2588.6^2/590.8859^2),runiform())
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Matthew Milano View Post
      I . . . want to assign start dates for follow up at random to the unexposed group by incidence density sampling from the distribution of index dates in the drug X cohort.
      If I wanted to mimic an empirical distribution, then I would probably use bootstrap-like resampling (that is, with replacement).

      You can use bsample for this. There's a little fiddling needed to draw a sample from your exposed group to match the size of your unexposed group: see below for an illustration of one way to accomplish this (begin at the "Begin here" comment).

      .ÿ
      .ÿversionÿ17.0

      .ÿ
      .ÿclearÿ*

      .ÿ
      .ÿ//ÿseedem
      .ÿsetÿseedÿ661633865

      .ÿ
      .ÿquietlyÿinputÿbyteÿIDÿintÿdobÿbyteÿsexÿstr1ÿdrugÿ///
      >ÿÿÿÿÿÿÿÿÿint(index_dateÿindex_yearÿenddate)

      .ÿ
      .ÿformatÿdobÿ*dateÿ%tdCY-N-D

      .ÿ
      .ÿlabelÿdefineÿSexesÿ1ÿFÿ2ÿM

      .ÿlabelÿvaluesÿsexÿSexes

      .ÿ
      .ÿtempfileÿexposed

      .ÿquietlyÿsaveÿ`exposed'

      .ÿ
      .ÿdropÿ_allÿ

      .ÿquietlyÿinputÿbyteÿIDÿintÿdobÿbyteÿsexÿintÿenddate

      .ÿ
      .ÿformatÿdobÿ*dateÿ%tdCY-N-D

      .ÿ
      .ÿlabelÿvaluesÿsexÿSexes

      .ÿ
      .ÿtempfileÿunexposed

      .ÿquietlyÿsaveÿ`unexposed'

      .ÿ
      .ÿ*
      .ÿ*ÿBeginÿhere
      .ÿ*
      .ÿ//ÿGetÿsampleÿsizeÿofÿunexposedÿpatientÿpopulation
      .ÿquietlyÿcount

      .ÿlocalÿtarget_Nÿ=ÿr(N)

      .ÿ
      .ÿ//ÿCreateÿaÿlargeÿdatasetÿofÿrandomlyÿsampledÿexposedÿpatients'ÿfollow-upÿelapsedÿdays
      .ÿuseÿ`exposed'

      .ÿgenerateÿintÿdeltaÿ=ÿenddateÿ-ÿindex_date

      .ÿkeepÿdelta

      .ÿ
      .ÿpreserve

      .ÿ
      .ÿbsampleÿ//ÿ,ÿstrata(sex)ÿ

      .ÿtempfileÿsampled_elapsed_days

      .ÿquietlyÿsaveÿ`sampled_elapsed_days'

      .ÿ
      .ÿwhileÿ_Nÿ<ÿ`target_N'ÿ{
      ÿÿ2.ÿÿÿÿÿÿÿÿÿrestore
      ÿÿ3.ÿÿÿÿÿÿÿÿÿpreserve
      ÿÿ4.ÿ
      .ÿÿÿÿÿÿÿÿÿbsampleÿ//,ÿstrata(sex)
      ÿÿ5.ÿÿÿÿÿÿÿÿÿappendÿusingÿ`sampled_elapsed_days'
      ÿÿ6.ÿÿÿÿÿÿÿÿÿquietlyÿsaveÿ`sampled_elapsed_days',ÿreplace
      ÿÿ7.ÿ}

      .ÿrestoreÿ,ÿnot

      .ÿquietlyÿkeepÿinÿ1/`target_N'

      .ÿgenerateÿlongÿrowÿ=ÿ_n

      .ÿquietlyÿsaveÿ`sampled_elapsed_days',ÿreplace

      .ÿ
      .ÿ//ÿApplyÿtheÿsampledÿfollow-upÿelapsedÿdaysÿtoÿunexposedÿpatients
      .ÿuseÿ`unexposed',ÿclear

      .ÿgenerateÿlongÿrowÿ=ÿ_n

      .ÿmergeÿ1:1ÿrowÿusingÿ`sampled_elapsed_days',ÿassert(match)ÿnogenerateÿnoreport

      .ÿdropÿrow

      .ÿ
      .ÿgenerateÿintÿstart_dateÿ=ÿenddateÿ-ÿdelta

      .ÿformatÿ*dateÿ%tdCY-N-D

      .ÿ
      .ÿlistÿinÿ1/5,ÿnoobs

      ÿÿ+---------------------------------------------------------+
      ÿÿ|ÿIDÿÿÿÿÿÿÿÿÿÿdobÿÿÿsexÿÿÿÿÿÿenddateÿÿÿdeltaÿÿÿstart_dateÿ|
      ÿÿ|---------------------------------------------------------|
      ÿÿ|ÿÿ1ÿÿÿ1947-07-01ÿÿÿÿÿMÿÿÿ2021-11-01ÿÿÿÿÿ153ÿÿÿ2021-06-01ÿ|
      ÿÿ|ÿÿ2ÿÿÿ1946-07-01ÿÿÿÿÿMÿÿÿ2006-09-11ÿÿÿÿ3828ÿÿÿ1996-03-19ÿ|
      ÿÿ|ÿÿ3ÿÿÿ1956-07-01ÿÿÿÿÿMÿÿÿ2017-05-08ÿÿÿÿ6556ÿÿÿ1999-05-27ÿ|
      ÿÿ|ÿÿ4ÿÿÿ1940-07-01ÿÿÿÿÿMÿÿÿ2013-05-30ÿÿÿÿ2977ÿÿÿ2005-04-05ÿ|
      ÿÿ|ÿÿ5ÿÿÿ1929-07-01ÿÿÿÿÿMÿÿÿ2010-08-04ÿÿÿÿÿ453ÿÿÿ2009-05-08ÿ|
      ÿÿ+---------------------------------------------------------+

      .ÿ
      .ÿexit

      endÿofÿdo-file


      .


      Code:
      help bsample

      Comment


      • #4
        Carlo Lazzaro Many thanks for posting the code.

        I tried your code and have the distribution for both groups but how do I assign the start of followup for unexposed group based on the distribution? In other words what is the next step

        Comment

        Working...
        X