Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate Random Dates Following a Distribution

    Hi Eevryone,

    I am trying to do a case control analysis for patients that are on drug 'Statin' matched with pateints that are not using the same drug and looking at outcome 'Y'.

    Here in example below, I have MRN (unique identification number for each patient), StatinUser (0/1 for nonuser vs user) and OrderDate (Date on which statin was ordered). I would like to generate random order dates for each non statin user following the distribution of dates in 'orderdate' variable.

    For instance, if i have 100 Statin users and 10 of them have ordered a statin in year 2010 and 20 of them have ordered statin in year 2015, I would like 10% of non statin users to have an order date in year 2010 and 20% of non statin users should have an order date in the year 2015.

    Code:
    * Example generated by -dataex-. To    install: ssc    install    dataex
    clear
    input long MRN float StatinUser int    OrderDate
    802 1 16714
    3897 1 17476
    4077 0     .
    16733 0     .
    32086 1 16147
    33100 1 19428
    35303 0     .
    35402 0     .
    44859 0     .
    46631 1 18004
    47191 1 17755
    51607 0     .
    52936 1 16611
    54114 0     .
    55186 1 20467
    63594 0     .
    67389 1 20489
    70508 0     .
    94664 0     .
    101758 1 17583
    103275 0     .
    106922 1 20956
    114538 0     .
    117457 1 19739
    119313 1 18295
    119750 1 18507
    122721 0     .
    122911 1 20555
    124115 0     .
    140665 0     .
    141747 1 16236
    141812 1 19166
    144469 1 18777
    149492 0     .
    153932 1 16540
    156828 0     .
    157974 1 17680
    162925 0     .
    165324 0     .
    171231 0     .
    173534 0     .
    174284 1 15956
    174466 0     .
    181628 0     .
    188730 0     .
    191346 1 21102
    194977 0     .
    197376 0     .
    197426 0     .
    207639 1 19527
    220996 0     .
    225169 1 18973
    226852 1 19012
    226878 0     .
    228536 0     .
    232462 1 20370
    233668 0     .
    235622 0     .
    242065 1 18644
    248450 1 19648
    250316 0     .
    267021 0     .
    275867 1 18857
    278226 1 20422
    278770 1 19884
    279968 0     .
    286021 0     .
    301515 0     .
    307009 0     .
    309781 0     .
    319129 0     .
    323121 0     .
    329961 0     .
    330845 0     .
    331025 1 15105
    333765 1 19599
    335653 1 21606
    336354 0     .
    336388 0     .
    337352 0     .
    348391 1 16643
    354142 0     .
    356733 0     .
    367078 0     .
    368811 0     .
    369744 1 19204
    374363 0     .
    382770 1 21165
    385724 0     .
    386219 1 21053
    401612 0     .
    424622 0     .
    432898 0     .
    433003 0     .
    434175 1 17787
    443721 1 16828
    446468 1 19317
    454843 0     .
    469080 1 19837
    469924 0     .
    end
    format %td OrderDate

    Thanks!
    Andy

  • #2
    You describe your study as "case-control," yet some of your description would seem to go against that. This may just be a terminological issue, or it could reflect a study design issue that is deeper than the question you ask. "Cases/controls," in common usage, are defined with respect to an outcome, but you describe cases and controls defined with respect to what I'd presume to be an explanatory variable, namely statin use. I might misunderstand you here.

    In any case, I don't think the matching and case-control features of your study matter for your question. I gather that you want to *impute* a random OrderDate for each statin non-user so that the distribution of the year is the same among users and non-users. It is in general not possible to do that in a finite sample with different numbers of users and nonusers, as you have provided. Instead, we can impute years so that the *expected* distribution of year is the same for users and non-users. That is what I show below. Or, if you *do* have equal numbers of users and non-users, what I offer below can be modified to produce exactly the same year distribution among the user/nonuser populations, so post again if that's your situation.

    Finally, you do not say what you want for the distribution of OrderDates within year, only that you want the *year* distribution to be the same. I'd presume (?) that you'd like to pick, with uniform probability, a day within each imputed year, and use that for the OrderDate. So, for each non-user, a remaining task beyond the code below is to assign a day number from 1, ..., 365(6) within each year and convert that to a date. I don't work with dates very much, so I'll leave that part to you or to someone else here who works with dates routinely. You could, alternatively, pick the actual OrderDate the same way I have picked the year below, but I'm presuming that's not what you want.

    Code:
    * Example generated by -dataex-. To    install: ssc    install    dataex
    clear
    input long MRN float StatinUser int    OrderDate
    802 1 16714
    3897 1 17476
    4077 0     .
    16733 0     .
    32086 1 16147
    33100 1 19428
    35303 0     .
    35402 0     .
    44859 0     .
    46631 1 18004
    47191 1 17755
    51607 0     .
    52936 1 16611
    54114 0     .
    55186 1 20467
    63594 0     .
    67389 1 20489
    70508 0     .
    94664 0     .
    101758 1 17583
    103275 0     .
    106922 1 20956
    114538 0     .
    117457 1 19739
    119313 1 18295
    119750 1 18507
    122721 0     .
    122911 1 20555
    124115 0     .
    140665 0     .
    141747 1 16236
    141812 1 19166
    144469 1 18777
    149492 0     .
    153932 1 16540
    156828 0     .
    157974 1 17680
    162925 0     .
    165324 0     .
    171231 0     .
    173534 0     .
    174284 1 15956
    174466 0     .
    181628 0     .
    188730 0     .
    191346 1 21102
    194977 0     .
    197376 0     .
    197426 0     .
    207639 1 19527
    220996 0     .
    225169 1 18973
    226852 1 19012
    226878 0     .
    228536 0     .
    232462 1 20370
    233668 0     .
    235622 0     .
    242065 1 18644
    248450 1 19648
    250316 0     .
    267021 0     .
    275867 1 18857
    278226 1 20422
    278770 1 19884
    279968 0     .
    286021 0     .
    301515 0     .
    307009 0     .
    309781 0     .
    319129 0     .
    323121 0     .
    329961 0     .
    330845 0     .
    331025 1 15105
    333765 1 19599
    335653 1 21606
    336354 0     .
    336388 0     .
    337352 0     .
    348391 1 16643
    354142 0     .
    356733 0     .
    367078 0     .
    368811 0     .
    369744 1 19204
    374363 0     .
    382770 1 21165
    385724 0     .
    386219 1 21053
    401612 0     .
    424622 0     .
    432898 0     .
    433003 0     .
    434175 1 17787
    443721 1 16828
    446468 1 19317
    454843 0     .
    469080 1 19837
    469924 0     .
    end
    format %td OrderDate
    //
    set seed 3486534 // make reproducible
    gen year = year(OrderDate)
    //
    // First, a reality check:
    if (StatinUser == 0) & !missing(year) {
      display as error "Some non user has an order date; error"
      exit
    }
    //
    // Put statin users (non-missing on year) at top of file and count them.
    sort year 
    count if !missing(year)
    local last = r(N)  // highest observation # with nonmissing year
    //
    // Each nonuser gets a year sampled with replacement from nonmissing observations.
    gen int pos = ceil(runiform() * `last')
    replace year = year[pos] if (StatinUser == 0)
    // Check the result
    tab year StatinUser, col chi2
    di "Chi2 shows that year distributions do not differ beyond chance."

    Comment

    Working...
    X