Help with a Very Slow Loop

Zach Levin

Join Date: Jul 2015

Posts: 5
#1

Help with a Very Slow Loop

10 Jul 2015, 11:11

I'm currently coding up an algorithm that deals with prescription drug refills.

The theory behind the code/loop I'm struggling with is that since patients refill their medications earlier than the day they run out of medicine (we know how many days worth of medication they were supplied with), to capture how compliant they are with a medication you have to adjust the start/end dates of the fill to accommodate for overlap.

The dataset is quite large but is taking an extensively long time (days) to make it through the code. I was hoping someone could offer feedback as to whether my code for the loop is super inefficient or it's just a byproduct of having a large dataset.

Code:

local GROUPVAR drug_class * Adjust dates to account for overlap (i.e. early fills) local keepgoing = 1 while `keepgoing' != 0 { sort PATID `GROUPVAR' FILL_DT qui by PATID `GROUPVAR': replace FILL_DT = (END_DT[_n-1] + 1) if (END_DT[_n-1] > FILL_DT[_n]) & _n != 1 * We need to know whether to keep looping qui count if END_DT != FILL_DT + DAYS_SUP if (r(N) > 0) { local keepgoing = 1 } if (r(N) == 0) { local keepgoing = 0 } qui replace END_DT = FILL_DT + DAYS_SUP }
Tags: data, loop
Mike Lacy

Join Date: Apr 2014

Posts: 2416
#2

10 Jul 2015, 12:06

The most expensive thing here is the sort. On my modest machine, sorting a file with _ N = 1e7 takes about 20 sec. 5000 such sorts would take about a day. Is it plausible that one patient would have a drug class that required 5000 loops to adjust it? I would suspect not, although I don't completely understand your "adjusting." My understanding of your code is that the loop will keep going if only one patient's observations for one drug class needed adjustment.
So, if 5000 repetitions is plausible, then my thought would be to try to find those kind of "high adjustment" observations before your loop and remove them from the data set and handle them separately. That way, you might only have to work through your big data set a few times, and run your loop 1000s of times on a smaller data set.

That being said: It's not obvious to me why the adjustment requires a loop. I trust that it does.

Regards, Mike
Comment
Zach Levin

Join Date: Jul 2015

Posts: 5
#3

10 Jul 2015, 12:13

Originally posted by Mike Lacy View Post

The most expensive thing here is the sort. On my modest machine, sorting a file with _ N = 1e7 takes about 20 sec. 5000 such sorts would take about a day. Is it plausible that one patient would have a drug class that required 5000 loops to adjust it? I would suspect not, although I don't completely understand your "adjusting." My understanding of your code is that the loop will keep going if only one patient's observations for one drug class needed adjustment.
So, if 5000 repetitions is plausible, then my thought would be to try to find those kind of "high adjustment" observations before your loop and remove them from the data set and handle them separately. That way, you might only have to work through your big data set a few times, and run your loop 1000s of times on a smaller data set.

That being said: It's not obvious to me why the adjustment requires a loop. I trust that it does.

Regards, Mike

Mike,

I had not thought about the sort taking computational power but that makes sense. The reason the adjustment requires a loop is that each patient could have many fills of a drug and that requires adjusting all future dates to accurately capture their medication usage. So, for example, if I fill medications on July 1, July 20th, and August 10 for 30 day supplies I need to reset the "start date" for each subsequent fill. Not sure if that's clear or not.

Would there be an efficient way to split the dataset by patient so instead of the whole data being sorted just the patient is being sorted? And then append all the patients at the end? The only catch is we have anywhere from 1,000 to 1,000,000 patients.

Thank you.
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

10 Jul 2015, 13:20

Perhaps I'm not understanding the problem here but I don't see why a loop is needed or why the data has to be sorted multiple times. I also don't get why you think that changing dates is a good way of tracking usage. Why not simply calculate for each patient and drug the supply in hand when a new fill occurs. Something like

Code:

* data setup; would be nice if the OP used -dataex- from SSC
* to save time if I don't get the details right
clear
set seed 4231234
set obs 5
gen PATID = _n
gen ndrugs = int(runiform(1,11))
expand ndrugs
bysort PATID: gen drug_class = _n
gen nfills = int(runiform(1,11))
expand nfills
bysort PATID drug_class: gen fill_id = _n
by PATID drug_class: gen FILL_DT = mdy(1,1,2010) + int(runiform(1,365))
gen DAYS_SUP = int(runiform(1,31))
format %td FILL_DT
drop ndrugs nfills

* dsince is the difference in days between consecutive fills
sort PATID drug_class FILL_DT fill_id
by PATID drug_class: gen dsince = FILL_DT - FILL_DT[_n-1]

* on the first day, the patient has DAYS_SUP in hand
by PATID drug_class: gen inhand = DAYS_SUP if _n == 1

* if some are left from the previous fill, add them
by PATID drug_class: replace inhand = ///
    cond(dsince > inhand[_n-1], 0, inhand[_n-1] - dsince) + DAYS_SUP ///
    if _n > 1

Comment

Zach Levin

Join Date: Jul 2015

Posts: 5
#5

10 Jul 2015, 13:40

Originally posted by Robert Picard View Post

Perhaps I'm not understanding the problem here but I don't see why a loop is needed or why the data has to be sorted multiple times. I also don't get why you think that changing dates is a good way of tracking usage. Why not simply calculate for each patient and drug the supply in hand when a new fill occurs. Something like

Code:

* data setup; would be nice if the OP used -dataex- from SSC * to save time if I don't get the details right clear set seed 4231234 set obs 5 gen PATID = _n gen ndrugs = int(runiform(1,11)) expand ndrugs bysort PATID: gen drug_class = _n gen nfills = int(runiform(1,11)) expand nfills bysort PATID drug_class: gen fill_id = _n by PATID drug_class: gen FILL_DT = mdy(1,1,2010) + int(runiform(1,365)) gen DAYS_SUP = int(runiform(1,31)) format %td FILL_DT drop ndrugs nfills * dsince is the difference in days between consecutive fills sort PATID drug_class FILL_DT fill_id by PATID drug_class: gen dsince = FILL_DT - FILL_DT[_n-1] * on the first day, the patient has DAYS_SUP in hand by PATID drug_class: gen inhand = DAYS_SUP if _n == 1 * if some are left from the previous fill, add them by PATID drug_class: replace inhand = /// cond(dsince > inhand[_n-1], 0, inhand[_n-1] - dsince) + DAYS_SUP /// if _n > 1

The measure we're using is developed in coordination with CMS who setup the following description:

The PDC (Proportion of Days Covered) numerator is the sum of the days covered by the days’ supply of all drug claims in each respective drug class. The period covered by the PDC starts on the day the first prescription is filled (index date) and lasts through the end of the measurement period, or death, whichever comes first. For prescriptions with a days’ supply that extends beyond the end of the measurement period, count only the days for which the drug was available to the individual during the measurement period. If there are prescriptions for the same drug (generic name) on the same date of service, keep the prescription with the largest days’ supply. If prescriptions for the same drug (generic name) overlap, then adjust the prescription start date to be the day after the previous fill has ended.

There's some SAS code laying around on how to do this and the only way I could think to replicate their array method in Stata was using loops. I hadn't thought of the way suggested above but I'm thinking through it and you may be on to something. Really appreciate the assistance.
Comment
Robert Picard

Join Date: Mar 2014

Posts: 1536
#6

10 Jul 2015, 13:51

Glad if this brings you to a more straightforward way of getting there. Just is case it isn't obvious, you can calculate, for every fill, the date that the patient will run out using

Code:

gen good_up_to = FILL_DT + inhand format %td good_up_to
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#7

12 Mar 2018, 23:54

Hi Zach Levin?

I am having same problem (trying to find
Proportion of Days Covered for each medication in a year
). Did you find solution to your problem.
I would really appreciate if you may share your solution.

Thank you
Oyun

Last edited by Buyadaa Oyunchimeg; 13 Mar 2018, 00:23.
Comment

Announcement

Help with a Very Slow Loop

Comment

Comment

Comment

Comment

Comment

Comment