medication combinations

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#16

07 May 2018, 19:55

Dear prof.Schechter,

I tried your code , in which that you have assumed that each drug is consumed at a rate of 1 per day in my real data. But there is some problem with my real data; there medications are coded by ATC5 code and some of the medications are combined in a single dosage forms and coded under one ATC5 code. I am having issue with these combinations of medications. I tried the below code from STATA forum which gives name of medication regimens. But in real data it does not work well (it takes more than 24 hours for one extract) and seems still can not deal with combination of medications which coded under one ATC5 code.

I would really appreciate if you may help me solve the problem.
Thank you

clear
input byte ID str9 startdate str9 lastdate str7 drug

2 "24-Aug-95" "14-Nov-97" "A01AD05"
2 "14-Nov-97" "15-Jun-00" "B01AC04"
3 "14-Nov-97" "13-Feb-03" "C10AA05"
3 "15-Jun-00" "13-Feb-03" "B01AC04"
3 "15-Jun-00" "13-Feb-03" "A01AD05"

end

generate int calendar_dt1 = date(startdate, "DMY", 2020)
generate int calendar_dt0 = date(lastdate, "DMY", 2020)
generate long row_nr = _n

quietly reshape long calendar_dt, i(row_nr) j(action)

label define Actions 1 Add 0 Remove

label values action Actions

drop startdate lastdate row_nr

format calendar_dt %tdCCYY-NN-DD

egen long event_nr = group(calendar_dt)

gsort +event_nr -action

order ID event_nr action drug calendar_dt

list, noobs sepby(event_nr) abbreviate(20)
generate str drug_regimen = drug if action == "Add":Actions

replace drug_regimen = trim(itrim(cond(action == "Add":Actions, drug_regimen + " " + drug_regimen[_n-1], subinword(drug_regimen[_n-1], drug, "", 1)))) if _n > 1

list, noobs sepby(event_nr) abbreviate(20)

by event_nr: keep if _n == _N

generate byte drug_tally = wordcount(drug_regimen)

list ID event_nr calendar_dt drug_regimen drug_tally, noobs abbreviate(20)

Last edited by Buyadaa Oyunchimeg; 07 May 2018, 20:34.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#17

08 May 2018, 14:26

Well, your question isn't clear. Your code runs, and in the example data you show it produces sensible results. You imply that this is not the case in your real data, but you don't explain in what sense that is the case, and your example doesn't illustrate any problems. So I think you need to present an example from your data that illustrates the problems you are encountering. Show that example, show the output you get from this code using that data, and then explain why it isn't what you need.

I do see one problem. This code treats the entire data set as a single list of drugs being added and discontinued. This makes a lot of sense in the context of one individual, but your code doesn't do this -by individual-, it does it across the data set. Consequently, when a new patient is reached in the data, it doesn't "start over" and it just keeps adding that patient's regimen to the final regimen for the preceding patient. I suppose your data set might actually be about just one person, but if so I cannot understand how it could possibly take 24 hours to run.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#18

08 May 2018, 17:59

My apologies for unclear post and thank you for your response. Unfortunately, I couldn't produce any result using the above code in my real data (it runs more than 24 hours then suddenly stops and disappears). The data is big ~25-50 million observations in one extract.

As I mentioned before, currently I am trying to deal with combination of medication which coded under one ATC5 code. Your code runs perfectly in real data but it counts the combination of medications as one medication thus the result is not what I was expected. Studies or reports which used the same data showed that combination therapy increasing over time (but my current result shows the opposite).

What would you suggest for dealing with combination of medication coded under one ATC5 code? And is it possible to correct or change the above code's error?

Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#19

08 May 2018, 19:05

As I mentioned before, currently I am trying to deal with combination of medication which coded under one ATC5 code.

But I don't understand how this becomes a problem. Just because you are not getting results matching somebody else's analysis does not mean that you are getting wrong results for your data. What difference does it make whether a code represents a combination of medications or just a single one? Your code still correctly tracks what drugs are being added and which ones are being discontinued. I don't get it.

As for the long time, the only command I see in there that would take a long time to execute is the -gsort-, but that seems integral to the calculations, so I can't recommend you skip it. The other thing that will be slow, and also, frankly, pointless, are the -list- commands. That's fine when you're just testing the code on a small subset of the data, but -list-ing 25-50 million observations will take a very long time, and most of it will go out of the screen buffer anyway, and even if you kept it all how could you possibly read it?

And I don't understand how you get 25-50 million observations of medication orders on a single patient.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#20

08 May 2018, 19:42

Thank you for feedback and sorry again for unclear explanation. I've skipped the list command in real data and but when it comes to the below code it's getting very slow.

replace drug_regimen = trim(itrim(cond(action == "Add":Actions, drug_regimen + " " + drug_regimen[_n-1], subinword(drug_regimen[_n-1], drug, "", 1)))) if _n > 1

This code treats the entire data set as a single list of drugs being added and discontinued. This makes a lot of sense in the context of one individual, but your code doesn't do this -by individual-, it does it across the data set. Consequently, when a new patient is reached in the data, it doesn't "start over" and it just keeps adding that patient's regimen to the final regimen for the preceding patient. I suppose your data set might actually be about just one person

.

I think I gave you wring explanation from the beginning. 25-50 million is overall observation number in one extract and one patient on an average has thousand observations. This is maybe the reason why the codes do not work in real data. Is it possible to modify them for data which has more than 1 million individuals?

Thank you

Last edited by Buyadaa Oyunchimeg; 08 May 2018, 20:14.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#21

09 May 2018, 07:14

I think this will do it by ID (I assume ID is your patient identifier).

Code:

clear input byte ID str9 startdate str9 lastdate str7 drug 2 "24-Aug-95" "14-Nov-97" "A01AD05" 2 "14-Nov-97" "15-Jun-00" "B01AC04" 3 "14-Nov-97" "13-Feb-03" "C10AA05" 3 "15-Jun-00" "13-Feb-03" "B01AC04" 3 "15-Jun-00" "13-Feb-03" "A01AD05" end generate int calendar_dt1 = date(startdate, "DMY", 2020) generate int calendar_dt0 = date(lastdate, "DMY", 2020) generate long row_nr = _n quietly reshape long calendar_dt, i(row_nr) j(action) label define Actions 1 Add 0 Remove label values action Actions drop startdate lastdate row_nr format calendar_dt %tdCCYY-NN-DD capture program drop one_id program define one_id gsort calendar_dt -action order ID calendar_dt action drug generate str drug_regimen = drug if action == "Add":Actions replace drug_regimen = trim(itrim(cond(action == "Add":Actions, /// drug_regimen + " " + drug_regimen[_n-1], /// subinword(drug_regimen[_n-1], drug, "", 1)))) if _n > 1 by calendar_dt: keep if _n == _N generate byte drug_tally = wordcount(drug_regimen) exit end runby one_id, by(ID) status list, noobs clean

Note: To do this you need the -runby- command, written by Robert Picard and me, and available from SSC.

I have eliminated the variable event_nr. It serves no real purpose in the code: it is just a recoding in consecutive integers starting at 1 of calendar_dt. But you can just use calendar_dt for everything you did with event_nr. Since there are probably a lot of calendar dates in the data, eliminating the creation of event_nr should make a noticeable, though not enormous, improvement in the executioni time. But you are still stuck with having to sort the data (-runby- does this internally) so it is going to be slow. Because of the -status- option, -runby- will update you periodically about its progress and how much longer it thinks it will take to finish.

The -list- command at the end is just to enable you to see what happens with this example. With your real data, involving millions of observations, you should not attempt to -list- everything. As noted in an earlier post in this thread, it would be pointless and would waste a lot of time. You can -browse- the data set if you wish to visually check some of your results.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#22

10 May 2018, 06:39

Dear prof. Schechter,

Thank you so much. I will try it in my real data and let you know how it went.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#23

14 May 2018, 23:36

Dear prof. Schechter,

I've tried the codes in real data. Generally, it works well but I got a problem with drug_regimen and drug_tally. For example, ID 2 had a drug regimen "A01AD05" in 1997 (single medication), but there are duplicates in drug regimen (A01AD05 A01AD05=> drug_tally=2). I've tried to remove duplicates and generate new variable "number of medications" (single, dual, >= triple therapy) but I couldn't.

Is it possible to remove duplicates from drug_regimen and generate new variable which shows the number of medications for each individuals and what would you suggest?

Thank you

PHP Code:

list, noobs clean ID calendar~t action drug drug_regimen drug_t~y 2 1995-08-24 Add A01AD05 A01AD05 1 2 1997-08-24 Add A01AD05 A01AD05 A01AD05 2 2 1997-11-14 Remove A01AD05 B01AC04 A01AD05 2 2 1998-11-14 Remove A01AD05 B01AC04 1 2 1999-11-14 Add B01AC04 B01AC04 B01AC04 2 2 2000-06-15 Remove B01AC04 0 2 2001-11-14 Add B01AC04 B01AC04 1 2 2002-08-24 Add A01AD05 A01AD05 B01AC04 2 2 2006-06-15 Remove B01AC04 A01AD05 1 2 2007-11-14 Remove A01AD05 0 3 1997-11-14 Add C10AA05 C10AA05 1 3 2000-06-15 Add B01AC04 B01AC04 C10AA05 2 3 2001-06-15 Add A01AD05 A01AD05 B01AC04 C10AA05 3 3 2003-02-13 Remove C10AA05 0 3 2004-02-13 Remove A01AD05 0 3 2004-06-15 Add B01AC04 B01AC04 A01AD05 2 3 2007-02-13 Remove B01AC04 A01AD05 1 -dataex clear input byte ID str9 startdate str9 lastdate str7 drug 2 "24-Aug-95" "14-Nov-97" "A01AD05" 2 "14-Nov-97" "15-Jun-00" "B01AC04" 2 "24-Aug-97" "14-Nov-98" "A01AD05" 2 "14-Nov-99" "15-Jun-00" "B01AC04" 2 "24-Aug-02" "14-Nov-07" "A01AD05" 2 "14-Nov-01" "15-Jun-06" "B01AC04" 3 "14-Nov-97" "13-Feb-03" "C10AA05" 3 "15-Jun-00" "13-Feb-03" "B01AC04" 3 "15-Jun-01" "13-Feb-03" "A01AD05" 3 "15-Jun-04" "13-Feb-07" "B01AC04" 3 "15-Jun-04" "13-Feb-04" "A01AD05" end

Last edited by Buyadaa Oyunchimeg; 15 May 2018, 00:13.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30065

#24

16 May 2018, 10:41

So, it's a relatively minor modification to the earlier code, although it looks a bit complex syntactically because it entails embedding one -cond()- expression inside another. But the gist of it is simply to only add the drug to the regiment if it does not already appear in the regimen.

Code:

generate int calendar_dt1 = date(startdate, "DMY", 2020)
generate int calendar_dt0 = date(lastdate, "DMY", 2020)
generate long row_nr = _n

quietly reshape long calendar_dt, i(row_nr) j(action)

label define Actions 1 Add 0 Remove

label values action Actions

drop startdate lastdate row_nr

format calendar_dt %tdCCYY-NN-DD

capture program drop one_id
program define one_id
    gsort calendar_dt -action
    order ID  calendar_dt action drug
    generate str drug_regimen = drug if action == "Add":Actions
    replace drug_regimen = cond(strpos(drug_regimen[_n-1], drug_regimen) == 0 & action == "Add":Actions, ///
        drug_regimen + " " + drug_regimen[_n-1], ///
        cond(action == "Remove":Actions, subinword(drug_regimen[_n-1], drug, "", .), drug_regimen[_n-1]))
    replace drug_regimen = trim(itrim(drug_regimen))
    by calendar_dt: keep if _n == _N
    generate byte drug_tally = wordcount(drug_regimen)
    keep ID calendar_dt drug_regimen drug_tally
    exit
end

runby one_id, by(ID) status 

sort ID calendar_dt
list, noobs clean

Notes: Again, don't attempt the final -list- command in your real data. Just -browse- or -save- the results. The -keep- command added at the end was added because on reviewing the output I realized that there were some variables that had become irrelevant once we reduced to a single observation per date and they looked confusing.

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#25

17 May 2018, 17:41

Thank you so much prof. Schechter.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#26

21 May 2018, 18:00

Dear prof. Schechter,

Thank you again for your help, codes worked well in real data. Now I am trying to calculate the proportions of single, dual and triple therapy by year and again having difficulty due to lack of knowledge. Is there anyway to calculate the proportions by year using loops? I've tried to do this by generating new variables, but it seems not accurate.

Sincerely,
Oyun
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#27

21 May 2018, 22:24

So, again, please show an example of the data you will be using for this. And also define what you mean by single, dual, and triple therapy in terms of information that is available in that data.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#28

21 May 2018, 23:42

My data looks as below. I am running the analysis by separate medication classes, for example the current one includes only anti-thrombotic medications. I would like to calculate the proportions of individuals on single anti-thrombotic medication or dual or triple anti-thrombotic medications by year using drug_tally variable.

Because one individual has many duplicates in terms of ID it seems difficult to accurately calculate the the proportions by year. What would you suggest?

PHP Code:

clear input byte ID int calendar_dt str19 drug_regimen byte drug_tally float year 2 13019 "A01AD05" 1 1995 2 13819 "B01AC05 A01AD05" 2 1997 2 13832 "B01AC05" 1 1997 2 13863 "" 0 1997 2 17037 "B01AC05" 1 2006 2 17117 "B01AC05" 1 2006 2 17119 "" 0 2006 2 17150 "" 0 2006 3 13832 "C10AA05" 1 1997 3 13923 "" 0 1998 3 14776 "B01AC04" 1 2000 3 14835 "" 0 2000 3 15141 "B01AC04" 1 2001 3 15200 "" 0 2001 3 16237 "B01AC04" 1 2004 3 16262 "C01AC04 B01AC04" 2 2004 3 16265 "C01AC04" 1 2004 3 16296 "" 0 2004 end format %tdCCYY-NN-DD calendar_dt
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30065
#29

22 May 2018, 08:55

This would ordinarily do it:

Code:

gen multiplicity = wordcount(drug_regimen) tab multiplicity year, col

Because one individual has many duplicates in terms of ID it seems difficult to accurately calculate the the proportions by year. What would you suggest?

I suggest you fix the data so that each individual has a unique ID. The fact that you know about this problem implies that there is some other source that enables you to identify individuals and their associated IDs in this data set. Use that same other source to fix the data you are working with.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#30

23 May 2018, 20:17

Thank you so much prof. Schechter.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment