Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • medication combinations

    Dear all,

    I have an issue. I am trying to calculate, from a medication database, the number of different drug combinations (single medication, 2 drug treatment, 3 or more drug treatment) of each classes (lipid lowering, antithrombotic) by year.

    Dataset contains index date for each medication and last date was generated by last purchase date for any medications for each patient.

    Data looks as below:

    clear
    input byte ID str9(indexdate lastdate) str3 drug
    1 "24Aug2005" "14Nov2007" "ASP"
    1 "24Aug2005" "14Nov2007" "CLO"
    1 "7Aug2006" "14Nov2007" "STA"
    1 "7Aug2006" "14Nov2007" "FIB"
    2 "14Nov2007" "13Feb2010" "CLO"
    2 "15Jun2002" "13Feb2010" "HEP"
    2 "15Jun2002" "13Feb2010" "ASP"
    2 "15Jun2002" "13Feb2010" "STA"
    end


    I would really appreciate if you may help me tp solve the problem.


    Thank you in advance.
    Last edited by Buyadaa Oyunchimeg; 23 Apr 2018, 03:36.

  • #2
    First, I'd recommend coding those dates in Stata format. You can generate a year variable from either one by just using the -strpos- function (like substring in Excel), but if you want more precise manipulation of dates, this is better:

    Code:
    gen last_date = date(lastdate,"DMY")
    format last_date %td
    tab last_date
    gen last_yr = yofd(last_date)
    tab last_yr
    Once you do that, then you need some way to map your drugs to drug class. Your variable drug looks like non-standard 3-letter abbreviations for some drugs. If we are talking lipid lowering drugs and antithrombotic drugs, then I suppose ASP could be aspirin, HEP could be heparin (an anticoagulant), etc. But there are many subtypes of antithrombotic drugs.

    You could use -egen- to do your next task:

    Code:
    sort ID
    by ID: egen aspirin = max(drug == "ASP")
    by ID: egen heparin = max(drug == "HEP")
    For example, that would create a flag indicating that someone had ASP orHEP at any time. If you change the -by:- clause to include last_yr, then it will generate groups based on the last year that a drug was prescribed/a drug prescription was filled. Does that get you what you need?
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Code:
      //    CONVERT FROM STRING DATES TO STATA INTERNAL FORMAT DATE VARIABLES
      foreach v of varlist indexdate lastdate {
          gen _`v' = daily(`v', "DMY")
          assert !missing(_`v')
          drop `v'
          rename _`v' `v'
          format `v' %td
      }
      
      //    CHANGE VARIABLE NAMES TO BE MORE EVOCATIVE
      rename indexdate startdate
      rename lastdate enddate
      
      //    GO TO LONG LAYOUT
      reshape long @date, i(ID drug) j(_event) string
      
      //    VARIABLE TO SHOW THAT "START" ADDS 1 DRUG, AND "END" REMOVES 1 DRUG
      gen event = cond(_event == "start", 1, -1)
      
      //    CLASSIFY DRUGS
      label define drug_class    1    "Anti-Thrombotic"    ///
                              2    "Lipid Lowering"
      gen byte drug_class:drug_class = 1 if inlist(drug, "ASP", "CLO", "HEP")
      replace drug_class = 2 if inlist(drug, "STA", "FIB")
      
      //    ADD UP DRUGS IN USE AS OF EACH DATE
      gsort ID drug_class date -event
      by ID drug_class (date): gen num_drugs_in_class = sum(event)
      
      //    REDUCE TO ONE OBSERVATION PER ID#DRUG_CLASS#DATE
      collapse (last) num_drugs_in_class, by(ID drug_class date)
      
      //    MARK OBSERVATIONS WITH AN END DATE JUST BEFORE THE
      //    START OF THE NEXT ERA
      by ID drug_class (date): gen until = date[_n+1]-1
      format until %td
      rename date from
      order until, after(from)
      Notes:

      It isn't reasonable to expect people to be able to figure out how your how different drugs fit into classes. Most Forum members know nothing about these matters. So you need to explain them in your posts. Even worse, your abbreviations of the drug names are unclear to somebody who might be knowledgeable in this area? For example is CLO clopidogrel, an anti-thrombotic, or clofibrate, a lipid-lowering drug. I have guessed the former, but only because of familiarity with your earlier posts on this Forum. You improve your chances of getting a rapid and timely response to your posts by making them understandable to everybody who might know statistics and Stata, even if they know nothing about your content area.

      Again, in light of your previous posts and the responses you have gotten, I'm surprised you are still trafficking in string variables for dates. That may be the way your data gets imported from its original source, but really I think you should be converting them all to Stata numeric internal format dates as soon as you create your data sets. There are absolutely no advantages to having dates as strings, and string dates are useless for almost all purposes.

      I have also made a guess that what you want from this data is a new data set showing ID's and drug classes with ranges of dates and the number of drugs from the class in use during that range of dates. That is what this code does. If you wanted something else, please show an example of what you want the output to look like.

      And my code also assumes that your data are clean in certain respects. My guess is that the real data are not, because your previous posts have shown instances of substantial irregularities in it. For example, my post assumes that if a person has a prescription for a drug from a certain indexdate to a certain lastdate that the same person does not also have another prescription for the same drug on other dates that overlap this interval. My code will give incorrect results (double-counting the same drug in the overlap period) if this happens. If this is a problem in your real data, please post a new data example that includes some instances of this problem so that any code that works around this difficulty can be properly tested.

      Comment


      • #4
        Dear prof Schechter,

        Thank you for valuable comments. I will try my best to improve my posts in the future.
        In my real data a person has several prescription for a drug from a certain indexdate (supply date) that overlap the interval between indexdate and lastdate. Real data does no have a variable "lastdate", it was generated by last purchase date.

        I would like to calculate the number (proportion) of different drug combinations (single medication, 2 drug treatment, 3 or more drug treatment) of each classes (lipid lowering, antithrombotic) by supply year and see whether the proportion of people taking combination of medication (by each class) increasing over time.

        Data look as below.

        clear
        input byte ID str3 drug float(startdate lastdate)
        1 "ASP" 16672 19676
        1 "CLO" 16672 19676
        1 "CLO" 18907 19676
        1 "CLO" 18802 19676
        1 "CLO" 19229 19676
        1 "CLO" 19292 19676
        1 "CLO" 18815 19676
        1 "STA" 16989 18945
        1 "STA" 17020 18945
        1 "FIB" 17020 18945
        2 "CLO" 17484 18306
        2 "HEP" 15506 18306
        2 "ASP" 15506 18306
        2 "ASP" 18053 18306
        2 "STA" 15506 18306
        2 "STA" 17829 18306
        end
        format %td startdate
        format %td lastdate

        Thank you so much.
        Last edited by Buyadaa Oyunchimeg; 23 Apr 2018, 19:10.

        Comment


        • #5
          Real data does no have a variable "lastdate", it was generated by last purchase date.
          Perhaps I misunderstand the situation, but it appears to me that, in light of this, there is no solution to your problem. Let's look at the drug CLO for ID #1. There are prescriptions on 24aug2005, 24jun2011, 07jul2011, 07oct2011, 24aug2012 and 26oct2012, and perhaps another one on 14nov2013. While I can imagine that we can consider a prescription filled on 07jul2011 to last until 07oct2011, I very much doubt that one filled on 24aug2005 could last until 24jun2011, or that the one on 26oct2012 could have listed until 14nov2013. Without knowing how long each prescription could last, it is impossible to say whether or not this person was taking CLO during much of the time between 24aug2006 and 24jun2011. At some point that 2006 prescription was used up, but when did that happen? Without that information it is impossible to know how many lipid lowering drugs the patient was receiving during those uncertain times.

          Do you have other data that enable you to figure out how long each prescription would last?

          Comment


          • #6
            There is a variable named "quantity"- quantity of medication (example: Aspirin 100 tab, Clopidogrel 28 tab etc. ) that were dispensed. Is it possible to generate lastdate using this variable?

            Comment


            • #7
              Well, this would work for some of the drugs. Aspirin (when used as an anti-thrombotic) and clopidogrel are both typically administered just once a day. So a 28 tab prescription should be good for 28 days, and 100 aspirin should last just over 3 months. But heparin would be very difficult to figure out: the individual dosage amount is not the same for all people, and even the time between injections varies, typically somewhere in the 4-6 hour range. So it would be impossible to say how long a particular amount of heparin would last. Similarly "statins" refers to a class of drugs, which differ in their schedules of administration. So one would have to know which particular statin a prescription represented. Even there, some uncertainty remains. While most of them are administered once a day, some can be administered twice a day, so, again, we would not really know how many days a given prescription would cover. For example, 60 fluvastatin 20mg tablets might represent 30 days coverage for a dosage of 40 mg/day (20 mg twice a day) or it might represent 60 days coverage for a dosage of 20 mg/day.

              Comment


              • #8
                I agree with you. It would be very difficult to do calculation for heparin.

                How if we use an average therapeutic dose and approximately calculate lastdate (treatment stop date) and for example if multiple (2 or >=3) prescriptions of lipid lowering medications were dispensed on the same day, count them as combination therapy.
                I can reorganize data keeping only lipid lowering or anti-thrombotic medication prescription. What would you suggest?


                Comment


                • #9
                  Yes, if we restrict the data to medications whose stop date can be calculated from the number of units dispensed, then it can be done.

                  Comment


                  • #10
                    Here I've removed Heparin and added quantity for each medication.

                    Based on this how can we generate the lastdate and based on them calculate the number (proportion) of different drug combinations (single medication, 2 drug treatment, 3 or more drug treatment) of each classes (lipid lowering, antithrombotic) by supply year?



                    clear
                    input byte ID str3 drug float(startdate qty)
                    1 "ASP" 16672 90
                    1 "CLO" 16672 28
                    1 "CLO" 18907 28
                    1 "CLO" 18802 28
                    1 "CLO" 19229 28
                    1 "CLO" 19292 28
                    1 "CLO" 18815 28
                    1 "STA" 16989 30
                    1 "STA" 17020 30
                    1 "FIB" 17020 30
                    2 "CLO" 17484 28
                    2 "ASP" 15506 90
                    2 "ASP" 18053 90
                    2 "STA" 15506 30
                    2 "STA" 17829 30
                    end
                    format %td startdate


                    Thank you.

                    Comment


                    • #11
                      Buyadaa, FYI, in the United States, our Medicare Part D files (for the benefit of general readers, this is our prescription drug insurance) have a variable called days' supply, and I believe most prescription medication claims files in the US have the same variable as well (from personal experience, one public payer and one private payer database I've worked with did). Surely your database has a variable like this? It would make your job easier.

                      For reference, a table of available variables for the Part D prescriptions file is here.
                      Last edited by Weiwen Ng; 24 Apr 2018, 06:05.
                      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                      Comment


                      • #12
                        So here is some code, in which I have assumed that each drug is consumed at a rate of 1 per day. As noted, this is a questionable assumption for statins, and you may want to exclude them (or perhaps exclude the entire class of lipid lowering drugs since statins are the mainstay for that).

                        Code:
                        clear
                        input byte ID str3 drug float(startdate qty)
                        1 "ASP" 16672 90
                        1 "CLO" 16672 28
                        1 "CLO" 18907 28
                        1 "CLO" 18802 28
                        1 "CLO" 19229 28
                        1 "CLO" 19292 28
                        1 "CLO" 18815 28
                        1 "STA" 16989 30
                        1 "STA" 17020 30
                        1 "FIB" 17020 30
                        2 "CLO" 17484 28
                        2 "ASP" 15506 90
                        2 "ASP" 18053 90
                        2 "STA" 15506 30
                        2 "STA" 17829 30
                        end
                        format %td startdate
                        
                        ge enddate = startdate + qty
                        format enddate %td
                        
                        // GROUP THE DRUGS
                        label define drug_class    1    "Anti-Thrombotic"    2    "Lipid Lowering"
                        gen byte drug_class:drug_class = 1 if inlist(drug, "ASP", "CLO")
                        replace drug_class = 2 if inlist(drug, "FIB", "STA")
                        
                        //    GO LONG
                        gen long obs_no = _n
                        reshape long @date, i(obs_no) j(_event) string
                        
                        //    TALLY UP STARTS AND DISCONTINUATIONS OF DRUGS
                        gen event = cond(_event == "start", 1, -1)
                        gsort ID drug_class date -event
                        by ID drug_class (date): gen num_drugs_in_class = sum(event)
                        collapse (last) num_drugs_in_class, by(ID drug_class date)
                        
                        //    CALCULATE INTERVALS OF DIFFERENT LEVELS OF NUM_DRUGS_IN_CLASS
                        rename date from
                        by ID drug_class (from): gen until = from[_n+1]-1
                        format until %td
                        
                        //    COMBINE CONSECUTIVE PERIODS WITH SAME LEVELS OF DRUG
                        by ID drug_class (from), sort: gen spell = sum(num_drugs_in_class != num_drugs_in_class[_n-1])
                        collapse (min) from (max) until (first) num_drugs_in_class, by(ID drug_class spell)
                        sort ID drug_class (from)
                        drop spell
                        If Weiwen's advice that a variable called days' supply is available applies in your setting, you would be much better off using that variable instead of qty + assumption of 1 unit per day.

                        Comment


                        • #13
                          Dear prof. Schechter and Ng,

                          Thank you so much for your help. As suggested, I will not include statins in the current analysis.

                          Unfortunately, the data I am working on is not from the US, but I will try to request for additional data (they may have a similar variable).


                          Last edited by Buyadaa Oyunchimeg; 24 Apr 2018, 23:08.

                          Comment


                          • #14
                            Dear all,

                            I got a problem when I applied the codes in real data. STATA says a variable is not found r111. What might cause the problem?


                            // GROUP THE DRUGS

                            label define drug_class 1 "Anti-Thrombotic" 2 "Lipid Lowering"
                            gen byte drug_class:drug_class = 1 if inlist(drug, ASP, CLO)
                            CLO not found r(111);



                            Data looks as below:
                            clear
                            input byte ID float(startdate qty drug)
                            1 16672 90 1
                            1 16672 28 2
                            1 18907 28 2
                            1 18802 28 2
                            1 19229 28 2
                            1 19292 28 2
                            1 18815 28 2
                            1 16989 30 3
                            1 17020 30 3
                            1 17020 30 4
                            2 17484 28 2
                            2 15506 90 1
                            2 18053 90 1
                            2 15506 30 3
                            2 17829 30 3
                            end
                            format %td startdate
                            label values drug drug1
                            label def drug1 1 "ASP", modify
                            label def drug1 2 "CLO", modify
                            label def drug1 3 "STA", modify
                            label def drug1 4 "FIB", modify


                            Thank you,
                            Sincerely
                            Oyun
                            Last edited by Buyadaa Oyunchimeg; 01 May 2018, 23:25.

                            Comment


                            • #15
                              Code:
                              ...if inlist(drug,”ASP”,”CLO”)
                              If you omit the quotes and the arguments are not numbers, Stata will search for variables instead. You need the quotes if you want Stata to match specific strings. As I recall, inlist can only accommodate a maximum of 9 or 10 string arguments. See this thread for tips on how to use inlist with more string arguments.
                              Last edited by Weiwen Ng; 02 May 2018, 07:58.
                              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                              Comment

                              Working...
                              X