Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with a Very Slow Loop

    I'm currently coding up an algorithm that deals with prescription drug refills.

    The theory behind the code/loop I'm struggling with is that since patients refill their medications earlier than the day they run out of medicine (we know how many days worth of medication they were supplied with), to capture how compliant they are with a medication you have to adjust the start/end dates of the fill to accommodate for overlap.

    The dataset is quite large but is taking an extensively long time (days) to make it through the code. I was hoping someone could offer feedback as to whether my code for the loop is super inefficient or it's just a byproduct of having a large dataset.

    Code:
    local GROUPVAR drug_class
    * Adjust dates to account for overlap (i.e. early fills)
    local keepgoing = 1
    while `keepgoing' != 0 {
        sort PATID `GROUPVAR' FILL_DT
        qui by PATID `GROUPVAR': replace FILL_DT = (END_DT[_n-1] + 1) if (END_DT[_n-1] > FILL_DT[_n]) & _n != 1
            * We need to know whether to keep looping
            qui count if END_DT != FILL_DT + DAYS_SUP 
            if (r(N) > 0) {
                local keepgoing = 1
            }
            if (r(N) == 0) {
                local keepgoing = 0
            }
    
        qui replace END_DT = FILL_DT + DAYS_SUP 
    }

  • #2
    The most expensive thing here is the sort. On my modest machine, sorting a file with _ N = 1e7 takes about 20 sec. 5000 such sorts would take about a day. Is it plausible that one patient would have a drug class that required 5000 loops to adjust it? I would suspect not, although I don't completely understand your "adjusting." My understanding of your code is that the loop will keep going if only one patient's observations for one drug class needed adjustment.
    So, if 5000 repetitions is plausible, then my thought would be to try to find those kind of "high adjustment" observations before your loop and remove them from the data set and handle them separately. That way, you might only have to work through your big data set a few times, and run your loop 1000s of times on a smaller data set.

    That being said: It's not obvious to me why the adjustment requires a loop. I trust that it does.

    Regards, Mike

    Comment


    • #3
      Originally posted by Mike Lacy View Post
      The most expensive thing here is the sort. On my modest machine, sorting a file with _ N = 1e7 takes about 20 sec. 5000 such sorts would take about a day. Is it plausible that one patient would have a drug class that required 5000 loops to adjust it? I would suspect not, although I don't completely understand your "adjusting." My understanding of your code is that the loop will keep going if only one patient's observations for one drug class needed adjustment.
      So, if 5000 repetitions is plausible, then my thought would be to try to find those kind of "high adjustment" observations before your loop and remove them from the data set and handle them separately. That way, you might only have to work through your big data set a few times, and run your loop 1000s of times on a smaller data set.

      That being said: It's not obvious to me why the adjustment requires a loop. I trust that it does.

      Regards, Mike
      Mike,

      I had not thought about the sort taking computational power but that makes sense. The reason the adjustment requires a loop is that each patient could have many fills of a drug and that requires adjusting all future dates to accurately capture their medication usage. So, for example, if I fill medications on July 1, July 20th, and August 10 for 30 day supplies I need to reset the "start date" for each subsequent fill. Not sure if that's clear or not.

      Would there be an efficient way to split the dataset by patient so instead of the whole data being sorted just the patient is being sorted? And then append all the patients at the end? The only catch is we have anywhere from 1,000 to 1,000,000 patients.

      Thank you.

      Comment


      • #4
        Perhaps I'm not understanding the problem here but I don't see why a loop is needed or why the data has to be sorted multiple times. I also don't get why you think that changing dates is a good way of tracking usage. Why not simply calculate for each patient and drug the supply in hand when a new fill occurs. Something like

        Code:
        * data setup; would be nice if the OP used -dataex- from SSC
        * to save time if I don't get the details right
        clear
        set seed 4231234
        set obs 5
        gen PATID = _n
        gen ndrugs = int(runiform(1,11))
        expand ndrugs
        bysort PATID: gen drug_class = _n
        gen nfills = int(runiform(1,11))
        expand nfills
        bysort PATID drug_class: gen fill_id = _n
        by PATID drug_class: gen FILL_DT = mdy(1,1,2010) + int(runiform(1,365))
        gen DAYS_SUP = int(runiform(1,31))
        format %td FILL_DT
        drop ndrugs nfills
        
        * dsince is the difference in days between consecutive fills
        sort PATID drug_class FILL_DT fill_id
        by PATID drug_class: gen dsince = FILL_DT - FILL_DT[_n-1]
        
        * on the first day, the patient has DAYS_SUP in hand
        by PATID drug_class: gen inhand = DAYS_SUP if _n == 1
        
        * if some are left from the previous fill, add them
        by PATID drug_class: replace inhand = ///
            cond(dsince > inhand[_n-1], 0, inhand[_n-1] - dsince) + DAYS_SUP ///
            if _n > 1

        Comment


        • #5
          Originally posted by Robert Picard View Post
          Perhaps I'm not understanding the problem here but I don't see why a loop is needed or why the data has to be sorted multiple times. I also don't get why you think that changing dates is a good way of tracking usage. Why not simply calculate for each patient and drug the supply in hand when a new fill occurs. Something like

          Code:
          * data setup; would be nice if the OP used -dataex- from SSC
          * to save time if I don't get the details right
          clear
          set seed 4231234
          set obs 5
          gen PATID = _n
          gen ndrugs = int(runiform(1,11))
          expand ndrugs
          bysort PATID: gen drug_class = _n
          gen nfills = int(runiform(1,11))
          expand nfills
          bysort PATID drug_class: gen fill_id = _n
          by PATID drug_class: gen FILL_DT = mdy(1,1,2010) + int(runiform(1,365))
          gen DAYS_SUP = int(runiform(1,31))
          format %td FILL_DT
          drop ndrugs nfills
          
          * dsince is the difference in days between consecutive fills
          sort PATID drug_class FILL_DT fill_id
          by PATID drug_class: gen dsince = FILL_DT - FILL_DT[_n-1]
          
          * on the first day, the patient has DAYS_SUP in hand
          by PATID drug_class: gen inhand = DAYS_SUP if _n == 1
          
          * if some are left from the previous fill, add them
          by PATID drug_class: replace inhand = ///
          cond(dsince > inhand[_n-1], 0, inhand[_n-1] - dsince) + DAYS_SUP ///
          if _n > 1

          The measure we're using is developed in coordination with CMS who setup the following description:
          The PDC (Proportion of Days Covered) numerator is the sum of the days covered by the days’ supply of all drug claims in each respective drug class. The period covered by the PDC starts on the day the first prescription is filled (index date) and lasts through the end of the measurement period, or death, whichever comes first. For prescriptions with a days’ supply that extends beyond the end of the measurement period, count only the days for which the drug was available to the individual during the measurement period. If there are prescriptions for the same drug (generic name) on the same date of service, keep the prescription with the largest days’ supply. If prescriptions for the same drug (generic name) overlap, then adjust the prescription start date to be the day after the previous fill has ended.
          There's some SAS code laying around on how to do this and the only way I could think to replicate their array method in Stata was using loops. I hadn't thought of the way suggested above but I'm thinking through it and you may be on to something. Really appreciate the assistance.

          Comment


          • #6
            Glad if this brings you to a more straightforward way of getting there. Just is case it isn't obvious, you can calculate, for every fill, the date that the patient will run out using

            Code:
            gen good_up_to = FILL_DT + inhand
            format %td good_up_to

            Comment


            • #7
              Hi Zach Levin?

              I am having same problem (trying to find
              Proportion of Days Covered for each medication in a year
              ). Did you find solution to your problem.
              I would really appreciate if you may share your solution.

              Thank you
              Oyun
              Last edited by Buyadaa Oyunchimeg; 13 Mar 2018, 00:23.

              Comment

              Working...
              X