Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating Polypharmacy Binary Variable

    Hi,

    I’m working with an EHR dataset and I need help creating a polypharmacy (yes/no) variable for my study. Polypharmacy definition (for my study): ≥5 medications taken simultaneously for ≥90 days within the 6 months (183 days) prior to the index date.

    I have individual-level medication data including medication name, status, start date, end date, and discontinued date, along with the index date.

    Medication status is recorded as “active,” “complete,” or “discontinued.” For active medications, only the start date is available; for completed medications, there is an end date; and for discontinued medications, there is a discontinued date. I need to account for overlapping periods of medications to identify patients who meet the polypharmacy criteria.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1 id str5 med str12 medstat str10(stdate endate disdate)
    "1" "med A" "Active" "6/26/2023" "." "."
    "1" "med B" "Complete" "6/27/2023" "2/4/2024" "."
    "1" "med C" "Complete" "6/27/2023" "11/17/2023" "."
    "1" "med D" "Discontinued" "6/28/2023" "." "7/19/2023"
    "1" "med E" "Active" "6/27/2023" "." "."
    "1" "med F" "Complete" "6/26/2023" "11/12/2023" "."
    "1" "med G" "Discontinued" "6/26/2023" "." "2/4/2024"
    "1" "med H" "Discontinued" "6/26/2023" "." "2/4/2024"
    "1" "med I" "Discontinued" "6/26/2023" "." "6/27/2023"
    "1" "med J" "Discontinued" "6/26/2023" "." "2/4/2024"
    "1" "med K" "Discontinued" "6/27/2023" "." "7/19/2023"
    end

    Thank you so much for your help.


  • #2
    [QUOTE=GuestI have individual-level medication data including medication name, status, start date, end date, and discontinued date, along with the index date.[/QUOTE]Your dataset snippet doesn't have an index date.
    Last edited by sladmin; 31 Mar 2026, 08:25. Reason: anonymize original poster

    Comment


    • #3
      Originally posted by Guest
      Polypharmacy definition (for my study): ≥5 medications taken simultaneously for ≥90 days within the 6 months (183 days) prior to the index date.
      Do those ninety days have to be a single continuous stretch in order to count, or can they be in separate noncontiguous epochs whose lengths total ninety?
      Last edited by sladmin; 31 Mar 2026, 08:26. Reason: anonymize original poster

      Comment


      • #4
        Thank you so much, Joseph. The 90 days doesn't need to be continuous for the main analysis. I have added index date to the above code and added participant id 2.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str1 id str5 med str12 medstat str10(stdate endate disdate indexdate)
        "1" "med A" "Active" "6/26/2023" "" "." "12/13/2023"
        "1" "med B" "Complete" "6/27/2023" " 2/4/2024" "." "."
        "1" "med C" "Complete" "6/27/2023" "11/17/2023" "." "."
        "1" "med D" "Discontinued" "6/28/2023" "" "7/19/2023" "."
        "1" "med E" "Active" "6/27/2023" "" "." "."
        "1" "med F" "Complete" "6/26/2023" "11/12/2023" "." "."
        "1" "med G" "Discontinued" "6/26/2023" "" "2/4/2024" "."
        "1" "med H" "Discontinued" "6/26/2023" "" "2/4/2024" "."
        "1" "med I" "Discontinued" "6/26/2023" "" "6/27/2023" "."
        "1" "med J" "Discontinued" "6/26/2023" "" "2/4/2024" "."
        "1" "med K" "Discontinued" "6/27/2023" "" "7/19/2023" "."
        "2" "med A" "Complete" "12/27/2023" " 1/17/2024" "." "3/6/2024"
        "2" "med B" "Discontinued" "12/27/2023" "." "5/6/2024" "."
        "2" "med C" "Complete" "1/4/2024" " 1/19/2024" "." "."
        "2" "med D" "Discontinued" "12/28/2023" "." "1/24/2024" "."
        "2" "med E" "Complete" "12/28/2023" " 4/3/2024" "." "."
        "2" "med F" "Complete" "12/28/2023" " 4/3/2024" "." "."
        "2" "med G" "Active" "1/17/2024" "" "." "."
        "2" "med H" "Discontinued" "1/25/2024" "." "5/6/2024" "."
        "2" "med I" "Discontinued" "1/25/2024" "." "2/2/2024" "."
        "2" "med J" "Discontinued" "1/24/2024" "." "2/19/2024" "."
        end

        Comment


        • #5
          Joseph Coveney

          Comment


          • #6
            Rather than just pinging Joseph it might be more fruitful to wonder why there hasn't been an answer. Simply but crucially, this is quite a challenging problem and only an experienced Stata user accustomed to this kind of data is likely to produce code. I don't fit that description.

            I think you will need to work at your data.

            String dates are useless for analysis. You need to produce numeric daily dates. That's well documented and Step 1.

            I suspect that you won't make much progress without reshaping your data so that events are ordered by date. An event could be the start of a medication (# drugs increases by 1) or its end (decreases by 1).

            The extra criteria (90 days, 6 months) I am not sure that I understand, so I am not volunteering code or further help, but what I think is likely to be Step 2 is discussed at

            Code:
            . search paired dates, sj
            
            Search of official help files, FAQs, Examples, and Stata Journals
            
            SJ-13-1 dm0068  . . . . . Stata tip 114: Expand paired dates to pairs of dates
                    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                    Q1/13   SJ 13(1):217--219                                (no commands)
                    tip on using expand to deal with paired dates
            There may be about 4 more steps to get what you want.

            Comment


            • #7
              Originally posted by Guest
              I had code for you a couple of days ago, but got tied up with other things, sorry.

              To me this kind of problem is better handled in Mata where you can more easily select rows to work on at a time, and so my suggestion (first code block below) uses Mata. I've put the code into a Mata class for tidiness and to avoid leaving stuff behind in Mata's global namespace after execution ends.

              It looks more complicated than is really is, which is basically to create a 183-day calendar ending on the participant's index date and successively laying each medication's date range over that calendar, incrementing a counter column for those days corresponding to the medication's date range. Again, this is straightforward in Mata: the calendar and counter can be placed side by side in a matrix with the first column the dates (counting backwards from the index date) and the second column the counter. The core algorithm—creation of the calendar and incrementing the counter—is done in the class method doone() in the code below. The rest of the class's methods are housekeeping and posting the results to a Stata frame (Results) for the user to peruse. The results are present in the frame as observations containing two variables: participant's ID number (pid) and the total number of days in the preceding 183 days from the participant's index date where the participant was prescribed five or more medications simultaneously (pyp for PolYPharmacy).

              As Nick mentions, you'll need to convert the dates to Stata dates, which are numeric. The second block of code below the Mata shows that and shows how to prepare the dataset for use by the Mata object; it's important that the dataset be prepared such that the variables are participant ID (converted to numeric datatype), medication ID (likewise converted), start date, end date (regardless of reason) and index date, all dates converted to Stata dates. Once the dataset is prepared, invoking the Mata code is simply loading the file containing the Mata code into Stata (run Polypharmacy.mata) and issuing a command from the Stata prompt (mata: runPolypharmacy()). The results will be nearly instantaneously present in the Results Stata frame, where you can list them for inspection and choose those participants whose values of pyp (tally of days of ≥5 medications simultaneously) are 90 or greater.

              First the Mata code; again, it looks more complicated than it really is.
              Code:
              version 19
              
              mata:
              mata set matastrict on
              
              class Polypharmacy {
                  private:
                      real matrix Data, Participants, Results
                      void new(), doone(), post()
                  public:
                      void determine()
              }
              void function Polypharmacy::new() {
              
                  st_view(Data=(.), ., .)
              
                  Participants = Results = uniqrows(Data[., 1], 1)
              }
              void function Polypharmacy::doone(real scalar pid) {
              
                  real matrix P
                  st_select(P=(.), Data, Data[., 1] :== Participants[pid, 1])
                  
                  real matrix Calendar
                  Calendar = J(183, 2, 0)
                  Calendar[., 1] = P[1, 5] :- (0::rows(Calendar)-1)
              
                  real colvector S
              
                  real scalar i, k
                  k = rows(P)
                  for (i=1; i<=k; i++) {
                      S = selectindex(P[i, 3] :<= Calendar[., 1] :& Calendar[., 1] :<= P[i, 4])
                      Calendar[S, 2] = Calendar[S, 2] :+ 1
                  }
              
                  Results[pid, 2] = colsum(Calendar[., 2] :>= 5)
              }
              void function Polypharmacy::post() {
                  
                  if (st_frameexists("Results")) {
                      st_framecurrent("Results")
                      st_dropvar(.)
                  }
                  else {
                      st_framecreate("Results")
                      st_framecurrent("Results")
                  }
              
                  st_addobs(rows(Results))
                  st_store(., st_addvar("int", ("pid", "pyp")), Results)
              }
              void function Polypharmacy::determine() {
              
                  real scalar pid, tot
                  tot = rows(Participants)
              
                  for (pid=1; pid<= tot; pid++) doone(pid)
              
                  post()
              }
              
              void function runPolypharmacy() {
              
                  class Polypharmacy scalar p
                  p.determine()
              }
              
              end
              And then its usage with your example dataset from #4 above, showing the conversion of your string variables to numeric, condensing redundant dates into a single end date, and ordering the variables as the Mata code expects them to be present in the dataset. (I've omitted the listing of your dataset above for brevity.)
              Code:
              version 19
              
              clear *
              
              input str1 id str5 med str12 medstat str10(stdate endate disdate indexdate)
              <omitted for brevity>
              end
              
              // Convert to numeric
              destring id, replace
              bysort id (med): generate byte drg = _n
              
              tempvar date
              foreach var of varlist *date {
                  quietly generate int `date' = date(`var', "MDY")
                  drop `var'
                  rename `date' `var'
                  format `var' %tdCY-N-D
              }
              quietly bysort id (indexdate): replace indexdate = indexdate[1]
              quietly replace endate = min(disdate, indexdate) if mi(endate)
              
              order id drg stdate endate indexdate
              keep id-indexdate
              
              list, noobs sepby(id) abbreviate(20)
              
              *
              * Begin here
              *
              
              run Polypharmacy.mata
              
              mata: runPolypharmacy()
              
              list, noobs
              
              exit.
              Complete do-file (Polypharmacy.do) and log file (Polypharmacy.smcl) for the code immediately above are attached in addition to the code containing the Mata code (Polypharmacy.mata).

              I've also attached another do-file containing some verification and validation testing (Polypharmacy test.do).
              Attached Files
              Last edited by sladmin; 31 Mar 2026, 08:26. Reason: anonymize original poster

              Comment


              • #8
                Hi Joseph, Thank you so much for taking the time to put this together. I really appreciate your time and efforts. This is so helpful. I’ll follow your guidance to prepare my dataset and will run the code.

                Comment


                • #9
                  Originally posted by Guest
                  I’ll . . . prepare my dataset and will run the code.
                  OK, take a look at the data listing after dataset preparation (it's shown in that attached .smcl log file) in order to see what your dataset should look like before invoking the Mata function. It's important, because I did not include any user-input validation or dataset error-checking code in either the Stata convert-to-numeric dataset-preparation code or the Mata class definition.

                  Let me know if you run into glitches, and again apologies for the tardiness—I wanted to do some testing of the code before posting it on the List, but got side tracked until today before I could finalize and run the Polypharmacy test.do file.
                  Last edited by sladmin; 31 Mar 2026, 08:26. Reason: anonymize original poster

                  Comment


                  • #10
                    Thank you so much, Joseph. This is really helpful. No worries at all about the timing. I’ll need some time to carefully prepare my full dataset and run the code, as I’m working through the steps of using MATA for the first time and want to make sure I understand everything correctly. I will reach out to you in case of any questions.

                    Comment

                    Working...
                    X