medication combinations

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#1

medication combinations

23 Apr 2018, 03:32

Dear all,

I have an issue. I am trying to calculate, from a medication database, the number of different drug combinations (single medication, 2 drug treatment, 3 or more drug treatment) of each classes (lipid lowering, antithrombotic) by year.

Dataset contains index date for each medication and last date was generated by last purchase date for any medications for each patient.

Data looks as below:

clear
input byte ID str9(indexdate lastdate) str3 drug
1 "24Aug2005" "14Nov2007" "ASP"
1 "24Aug2005" "14Nov2007" "CLO"
1 "7Aug2006" "14Nov2007" "STA"
1 "7Aug2006" "14Nov2007" "FIB"
2 "14Nov2007" "13Feb2010" "CLO"
2 "15Jun2002" "13Feb2010" "HEP"
2 "15Jun2002" "13Feb2010" "ASP"
2 "15Jun2002" "13Feb2010" "STA"
end

I would really appreciate if you may help me tp solve the problem.

Thank you in advance.

Last edited by Buyadaa Oyunchimeg; 23 Apr 2018, 03:36.
Tags: None
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#2

23 Apr 2018, 08:12

First, I'd recommend coding those dates in Stata format. You can generate a year variable from either one by just using the -strpos- function (like substring in Excel), but if you want more precise manipulation of dates, this is better:

Code:

gen last_date = date(lastdate,"DMY") format last_date %td tab last_date gen last_yr = yofd(last_date) tab last_yr

Once you do that, then you need some way to map your drugs to drug class. Your variable drug looks like non-standard 3-letter abbreviations for some drugs. If we are talking lipid lowering drugs and antithrombotic drugs, then I suppose ASP could be aspirin, HEP could be heparin (an anticoagulant), etc. But there are many subtypes of antithrombotic drugs.

You could use -egen- to do your next task:

Code:

sort ID by ID: egen aspirin = max(drug == "ASP") by ID: egen heparin = max(drug == "HEP")

For example, that would create a flag indicating that someone had ASP orHEP at any time. If you change the -by:- clause to include last_yr, then it will generate groups based on the last year that a drug was prescribed/a drug prescription was filled. Does that get you what you need?

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#3

23 Apr 2018, 17:24

Code:

// CONVERT FROM STRING DATES TO STATA INTERNAL FORMAT DATE VARIABLES foreach v of varlist indexdate lastdate { gen _`v' = daily(`v', "DMY") assert !missing(_`v') drop `v' rename _`v' `v' format `v' %td } // CHANGE VARIABLE NAMES TO BE MORE EVOCATIVE rename indexdate startdate rename lastdate enddate // GO TO LONG LAYOUT reshape long @date, i(ID drug) j(_event) string // VARIABLE TO SHOW THAT "START" ADDS 1 DRUG, AND "END" REMOVES 1 DRUG gen event = cond(_event == "start", 1, -1) // CLASSIFY DRUGS label define drug_class 1 "Anti-Thrombotic" /// 2 "Lipid Lowering" gen byte drug_class:drug_class = 1 if inlist(drug, "ASP", "CLO", "HEP") replace drug_class = 2 if inlist(drug, "STA", "FIB") // ADD UP DRUGS IN USE AS OF EACH DATE gsort ID drug_class date -event by ID drug_class (date): gen num_drugs_in_class = sum(event) // REDUCE TO ONE OBSERVATION PER ID#DRUG_CLASS#DATE collapse (last) num_drugs_in_class, by(ID drug_class date) // MARK OBSERVATIONS WITH AN END DATE JUST BEFORE THE // START OF THE NEXT ERA by ID drug_class (date): gen until = date[_n+1]-1 format until %td rename date from order until, after(from)

Notes:

It isn't reasonable to expect people to be able to figure out how your how different drugs fit into classes. Most Forum members know nothing about these matters. So you need to explain them in your posts. Even worse, your abbreviations of the drug names are unclear to somebody who might be knowledgeable in this area? For example is CLO clopidogrel, an anti-thrombotic, or clofibrate, a lipid-lowering drug. I have guessed the former, but only because of familiarity with your earlier posts on this Forum. You improve your chances of getting a rapid and timely response to your posts by making them understandable to everybody who might know statistics and Stata, even if they know nothing about your content area.

Again, in light of your previous posts and the responses you have gotten, I'm surprised you are still trafficking in string variables for dates. That may be the way your data gets imported from its original source, but really I think you should be converting them all to Stata numeric internal format dates as soon as you create your data sets. There are absolutely no advantages to having dates as strings, and string dates are useless for almost all purposes.

I have also made a guess that what you want from this data is a new data set showing ID's and drug classes with ranges of dates and the number of drugs from the class in use during that range of dates. That is what this code does. If you wanted something else, please show an example of what you want the output to look like.

And my code also assumes that your data are clean in certain respects. My guess is that the real data are not, because your previous posts have shown instances of substantial irregularities in it. For example, my post assumes that if a person has a prescription for a drug from a certain indexdate to a certain lastdate that the same person does not also have another prescription for the same drug on other dates that overlap this interval. My code will give incorrect results (double-counting the same drug in the overlap period) if this happens. If this is a problem in your real data, please post a new data example that includes some instances of this problem so that any code that works around this difficulty can be properly tested.
2 likes
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#4

23 Apr 2018, 18:26

Dear prof Schechter,

Thank you for valuable comments. I will try my best to improve my posts in the future.
In my real data a person has several prescription for a drug from a certain indexdate (supply date) that overlap the interval between indexdate and lastdate. Real data does no have a variable "lastdate", it was generated by last purchase date.

I would like to calculate the number (proportion) of different drug combinations (single medication, 2 drug treatment, 3 or more drug treatment) of each classes (lipid lowering, antithrombotic) by supply year and see whether the proportion of people taking combination of medication (by each class) increasing over time.

Data look as below.

clear
input byte ID str3 drug float(startdate lastdate)
1 "ASP" 16672 19676
1 "CLO" 16672 19676
1 "CLO" 18907 19676
1 "CLO" 18802 19676
1 "CLO" 19229 19676
1 "CLO" 19292 19676
1 "CLO" 18815 19676
1 "STA" 16989 18945
1 "STA" 17020 18945
1 "FIB" 17020 18945
2 "CLO" 17484 18306
2 "HEP" 15506 18306
2 "ASP" 15506 18306
2 "ASP" 18053 18306
2 "STA" 15506 18306
2 "STA" 17829 18306
end
format %td startdate
format %td lastdate

Thank you so much.

Last edited by Buyadaa Oyunchimeg; 23 Apr 2018, 19:10.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#5

23 Apr 2018, 19:42

Real data does no have a variable "lastdate", it was generated by last purchase date.

Perhaps I misunderstand the situation, but it appears to me that, in light of this, there is no solution to your problem. Let's look at the drug CLO for ID #1. There are prescriptions on 24aug2005, 24jun2011, 07jul2011, 07oct2011, 24aug2012 and 26oct2012, and perhaps another one on 14nov2013. While I can imagine that we can consider a prescription filled on 07jul2011 to last until 07oct2011, I very much doubt that one filled on 24aug2005 could last until 24jun2011, or that the one on 26oct2012 could have listed until 14nov2013. Without knowing how long each prescription could last, it is impossible to say whether or not this person was taking CLO during much of the time between 24aug2006 and 24jun2011. At some point that 2006 prescription was used up, but when did that happen? Without that information it is impossible to know how many lipid lowering drugs the patient was receiving during those uncertain times.

Do you have other data that enable you to figure out how long each prescription would last?
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#6

23 Apr 2018, 19:49

There is a variable named "quantity"- quantity of medication (example: Aspirin 100 tab, Clopidogrel 28 tab etc. ) that were dispensed. Is it possible to generate lastdate using this variable?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#7

23 Apr 2018, 20:09

Well, this would work for some of the drugs. Aspirin (when used as an anti-thrombotic) and clopidogrel are both typically administered just once a day. So a 28 tab prescription should be good for 28 days, and 100 aspirin should last just over 3 months. But heparin would be very difficult to figure out: the individual dosage amount is not the same for all people, and even the time between injections varies, typically somewhere in the 4-6 hour range. So it would be impossible to say how long a particular amount of heparin would last. Similarly "statins" refers to a class of drugs, which differ in their schedules of administration. So one would have to know which particular statin a prescription represented. Even there, some uncertainty remains. While most of them are administered once a day, some can be administered twice a day, so, again, we would not really know how many days a given prescription would cover. For example, 60 fluvastatin 20mg tablets might represent 30 days coverage for a dosage of 40 mg/day (20 mg twice a day) or it might represent 60 days coverage for a dosage of 20 mg/day.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#8

23 Apr 2018, 21:20

I agree with you. It would be very difficult to do calculation for heparin.

How if we use an average therapeutic dose and approximately calculate lastdate (treatment stop date) and for example if multiple (2 or >=3) prescriptions of lipid lowering medications were dispensed on the same day, count them as combination therapy.
I can reorganize data keeping only lipid lowering or anti-thrombotic medication prescription. What would you suggest?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#9

23 Apr 2018, 22:12

Yes, if we restrict the data to medications whose stop date can be calculated from the number of units dispensed, then it can be done.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#10

23 Apr 2018, 22:38

Here I've removed Heparin and added quantity for each medication.

Based on this how can we generate the lastdate and based on them calculate the number (proportion) of different drug combinations (single medication, 2 drug treatment, 3 or more drug treatment) of each classes (lipid lowering, antithrombotic) by supply year?

clear
input byte ID str3 drug float(startdate qty)
1 "ASP" 16672 90
1 "CLO" 16672 28
1 "CLO" 18907 28
1 "CLO" 18802 28
1 "CLO" 19229 28
1 "CLO" 19292 28
1 "CLO" 18815 28
1 "STA" 16989 30
1 "STA" 17020 30
1 "FIB" 17020 30
2 "CLO" 17484 28
2 "ASP" 15506 90
2 "ASP" 18053 90
2 "STA" 15506 30
2 "STA" 17829 30
end
format %td startdate

Thank you.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#11

24 Apr 2018, 06:02

Buyadaa, FYI, in the United States, our Medicare Part D files (for the benefit of general readers, this is our prescription drug insurance) have a variable called days' supply, and I believe most prescription medication claims files in the US have the same variable as well (from personal experience, one public payer and one private payer database I've worked with did). Surely your database has a variable like this? It would make your job easier.

For reference, a table of available variables for the Part D prescriptions file is here.

Last edited by Weiwen Ng; 24 Apr 2018, 06:05.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

#12

24 Apr 2018, 09:22

So here is some code, in which I have assumed that each drug is consumed at a rate of 1 per day. As noted, this is a questionable assumption for statins, and you may want to exclude them (or perhaps exclude the entire class of lipid lowering drugs since statins are the mainstay for that).

Code:

clear
input byte ID str3 drug float(startdate qty)
1 "ASP" 16672 90
1 "CLO" 16672 28
1 "CLO" 18907 28
1 "CLO" 18802 28
1 "CLO" 19229 28
1 "CLO" 19292 28
1 "CLO" 18815 28
1 "STA" 16989 30
1 "STA" 17020 30
1 "FIB" 17020 30
2 "CLO" 17484 28
2 "ASP" 15506 90
2 "ASP" 18053 90
2 "STA" 15506 30
2 "STA" 17829 30
end
format %td startdate

ge enddate = startdate + qty
format enddate %td

// GROUP THE DRUGS
label define drug_class    1    "Anti-Thrombotic"    2    "Lipid Lowering"
gen byte drug_class:drug_class = 1 if inlist(drug, "ASP", "CLO")
replace drug_class = 2 if inlist(drug, "FIB", "STA")

//    GO LONG
gen long obs_no = _n
reshape long @date, i(obs_no) j(_event) string

//    TALLY UP STARTS AND DISCONTINUATIONS OF DRUGS
gen event = cond(_event == "start", 1, -1)
gsort ID drug_class date -event
by ID drug_class (date): gen num_drugs_in_class = sum(event)
collapse (last) num_drugs_in_class, by(ID drug_class date)

//    CALCULATE INTERVALS OF DIFFERENT LEVELS OF NUM_DRUGS_IN_CLASS
rename date from
by ID drug_class (from): gen until = from[_n+1]-1
format until %td

//    COMBINE CONSECUTIVE PERIODS WITH SAME LEVELS OF DRUG
by ID drug_class (from), sort: gen spell = sum(num_drugs_in_class != num_drugs_in_class[_n-1])
collapse (min) from (max) until (first) num_drugs_in_class, by(ID drug_class spell)
sort ID drug_class (from)
drop spell

If Weiwen's advice that a variable called days' supply is available applies in your setting, you would be much better off using that variable instead of qty + assumption of 1 unit per day.

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#13

24 Apr 2018, 23:04

Dear prof. Schechter and Ng,

Thank you so much for your help. As suggested, I will not include statins in the current analysis.

Unfortunately, the data I am working on is not from the US, but I will try to request for additional data (they may have a similar variable).

Last edited by Buyadaa Oyunchimeg; 24 Apr 2018, 23:08.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#14

01 May 2018, 23:23

Dear all,

I got a problem when I applied the codes in real data. STATA says a variable is not found r111. What might cause the problem?

// GROUP THE DRUGS

label define drug_class 1 "Anti-Thrombotic" 2 "Lipid Lowering"
gen byte drug_class:drug_class = 1 if inlist(drug, ASP, CLO)
CLO not found r(111);

Data looks as below:
clear
input byte ID float(startdate qty drug)
1 16672 90 1
1 16672 28 2
1 18907 28 2
1 18802 28 2
1 19229 28 2
1 19292 28 2
1 18815 28 2
1 16989 30 3
1 17020 30 3
1 17020 30 4
2 17484 28 2
2 15506 90 1
2 18053 90 1
2 15506 30 3
2 17829 30 3
end
format %td startdate
label values drug drug1
label def drug1 1 "ASP", modify
label def drug1 2 "CLO", modify
label def drug1 3 "STA", modify
label def drug1 4 "FIB", modify

Thank you,
Sincerely
Oyun

Last edited by Buyadaa Oyunchimeg; 01 May 2018, 23:25.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#15

02 May 2018, 07:29

Code:

...if inlist(drug,”ASP”,”CLO”)

If you omit the quotes and the arguments are not numbers, Stata will search for variables instead. You need the quotes if you want Stata to match specific strings. As I recall, inlist can only accommodate a maximum of 9 or 10 string arguments. See this thread for tips on how to use inlist with more string arguments.

Last edited by Weiwen Ng; 02 May 2018, 07:58.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

Announcement