Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating cumulative time at risk for a rolling window and varying number of distinct values

    Dear Statalist,

    I am working with a medication dataset, where a higher number of concurrent medications represents a greater risk of side effects. We are working to determine the best cut-off values for the number of concurrent meds as a screening tool.

    The data is in long form, each observation represents one supply of a particular medication "drugname" (encoded to a numeric value) for an individual (identified by "id") on a particular date ("scriptdate") which has a certain amount of supply until a later date ("enddate"). We are interested in a particular drug group (denoted by a dummy "dbi" (not shown)).

    I am using Stata 17.

    Example data is shown below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id str21 med int scriptdate long enddate byte ndistinct
    1 "paracetamol + codeine" 21663 21693 2
    1 "olanzepine"            21663 21693 2
    2 "sertraline"            21812 21842 1
    2 "amitrypyline"          22095 22125 1
    2 "oxazepam"              22115 22145 1
    2 "paracetamol + codeine" 22173 22280 1
    2 "diazepam"              21552 21582 2
    2 "risperidone"           21552 21612 2
    3 "quetiapine"            21552 21612 2
    3 "risperidone"           21552 21582 2
    3 "tramadol"              21878 21908 3
    3 "oxycodone"             21885 21915 2
    3 "paracetamol + codeine" 21885 21915 2
    4 "fluoxetine"            22292 22322 1
    4 "quetiapine"            22323 22503 2
    4 "buprenorphine"         22323 22503 2
    5 "sertraline"            22250 22430 3
    5 "paracetamol + codeine" 22250 58954 3
    5 "gabapentin"            22328 22358 2
    5 "mirtzapine"            22328 22508 2
    end
    format %td scriptdate
    format %td enddate
    I have used SSC rangerun to calculate the number of distinct values ("ndistinct", shown in dataset above) of concurrent medication using the following code:

    program myprog
    qui tab drugname
    gen ndistinct = r(r)
    end

    rangerun myprog if dbi==1, use(drugname) inter(scripdate 0 enddate) by(id)

    We now want to calculate the total time an individual is "at risk" as denoted by different values of ndistinct. The problem is that whilst rangerun currently identifies the maximum number of distinct values within a time interval, for example, 4 concurrent medications, it is not easy to determine for how long in that interval the individual was taking 4 medications, as opposed to 3 or 2. This is important for us to calculate as it provides additional information to decide the cut-off value of ndistinct.

    I am struggling to find examples of how others have dealt with similar issues and would greatly appreciate any suggestions.

    Many thanks, Jean.





  • #2
    The following code will tell you the number of active prescriptions in place for each id in exhaustive ranges of dates.
    Code:
    assert scriptdate <= enddate
    gen long obs_no = _n
    reshape long @date, i(obs_no) j(script_end) string
    by id (date), sort: gen active_rxs = sum(script_end == "script") - sum(script_end == "end")
    collapse (max) active_rxs, by(id date)
    by id (date): gen until = date[_n+1] - 1
    format until %td
    rename date from

    Comment


    • #3
      Hi Clyde, thanks so much, what an elegant solution!!

      Much appreciated,

      Best, Jean.

      Comment

      Working...
      X