Dear Statalist,
I am working with a medication dataset, where a higher number of concurrent medications represents a greater risk of side effects. We are working to determine the best cut-off values for the number of concurrent meds as a screening tool.
The data is in long form, each observation represents one supply of a particular medication "drugname" (encoded to a numeric value) for an individual (identified by "id") on a particular date ("scriptdate") which has a certain amount of supply until a later date ("enddate"). We are interested in a particular drug group (denoted by a dummy "dbi" (not shown)).
I am using Stata 17.
Example data is shown below:
I have used SSC rangerun to calculate the number of distinct values ("ndistinct", shown in dataset above) of concurrent medication using the following code:
program myprog
qui tab drugname
gen ndistinct = r(r)
end
rangerun myprog if dbi==1, use(drugname) inter(scripdate 0 enddate) by(id)
We now want to calculate the total time an individual is "at risk" as denoted by different values of ndistinct. The problem is that whilst rangerun currently identifies the maximum number of distinct values within a time interval, for example, 4 concurrent medications, it is not easy to determine for how long in that interval the individual was taking 4 medications, as opposed to 3 or 2. This is important for us to calculate as it provides additional information to decide the cut-off value of ndistinct.
I am struggling to find examples of how others have dealt with similar issues and would greatly appreciate any suggestions.
Many thanks, Jean.
I am working with a medication dataset, where a higher number of concurrent medications represents a greater risk of side effects. We are working to determine the best cut-off values for the number of concurrent meds as a screening tool.
The data is in long form, each observation represents one supply of a particular medication "drugname" (encoded to a numeric value) for an individual (identified by "id") on a particular date ("scriptdate") which has a certain amount of supply until a later date ("enddate"). We are interested in a particular drug group (denoted by a dummy "dbi" (not shown)).
I am using Stata 17.
Example data is shown below:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte id str21 med int scriptdate long enddate byte ndistinct 1 "paracetamol + codeine" 21663 21693 2 1 "olanzepine" 21663 21693 2 2 "sertraline" 21812 21842 1 2 "amitrypyline" 22095 22125 1 2 "oxazepam" 22115 22145 1 2 "paracetamol + codeine" 22173 22280 1 2 "diazepam" 21552 21582 2 2 "risperidone" 21552 21612 2 3 "quetiapine" 21552 21612 2 3 "risperidone" 21552 21582 2 3 "tramadol" 21878 21908 3 3 "oxycodone" 21885 21915 2 3 "paracetamol + codeine" 21885 21915 2 4 "fluoxetine" 22292 22322 1 4 "quetiapine" 22323 22503 2 4 "buprenorphine" 22323 22503 2 5 "sertraline" 22250 22430 3 5 "paracetamol + codeine" 22250 58954 3 5 "gabapentin" 22328 22358 2 5 "mirtzapine" 22328 22508 2 end format %td scriptdate format %td enddate
program myprog
qui tab drugname
gen ndistinct = r(r)
end
rangerun myprog if dbi==1, use(drugname) inter(scripdate 0 enddate) by(id)
We now want to calculate the total time an individual is "at risk" as denoted by different values of ndistinct. The problem is that whilst rangerun currently identifies the maximum number of distinct values within a time interval, for example, 4 concurrent medications, it is not easy to determine for how long in that interval the individual was taking 4 medications, as opposed to 3 or 2. This is important for us to calculate as it provides additional information to decide the cut-off value of ndistinct.
I am struggling to find examples of how others have dealt with similar issues and would greatly appreciate any suggestions.
Many thanks, Jean.

Comment