incidence rate & medication trends

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#1

incidence rate & medication trends

09 Mar 2018, 01:58

Dear all,

I hope everything going well with you.

Currently I am working on medication data and having trouble in presenting the incidence rate of disease (MI) and medication trends in one graph.
Is it possible?

I would really appreciate if you may help me to solve the problem.

Thank you in advance.

Data looks as below.

input byte(id aspirin clopidogrel heparin lipid_lowering fibrate) int age str6 supply_date date_at_diagnosis date_of_death
1 1 0 1 0 0 65 2002 2005 2006
1 0 0 0 1 1 55 2003 2005 0
1 0 0 1 0 1 55 2003 0 0
1 0 0 0 0 1 75 2002 0 0
2 1 1 0 0 1 65 2003 2008 2008
2 0 1 0 0 0 66 2007 0 0
2 0 1 0 1 0 76 2006 0 0
3 0 0 0 1 1 76 2009 0 2010
3 1 0 0 0 0 46 2010 0 0
3 1 1 0 0 0 56 2011 2009 0
3 1 1 1 1 0 46 2008 0 0
3 0 0 0 1 0 56 2008 2009 0
4 0 0 1 0 1 55 2011 0 2012
4 0 0 0 0 1 75 2011 0 0
5 1 1 0 0 0 65 2003 2008 2008
5 0 1 0 0 1 66 2007 0 0
5 0 1 0 1 0 76 2006 2006 0
end
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#2

09 Mar 2018, 08:22

Please explain in greater detail what you mean by "trends in medication usage." There are many things that could be.

Added: And please explain what the variable "supply_date" means.

Last edited by Clyde Schechter; 09 Mar 2018, 08:25.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#3

10 Mar 2018, 02:55

I didn’t understand the data display. It seems to be sort of survival analysis design, but: id 1, for example, presented and didn’t present MI, what seems preposterous; ids without events have zero in the time variables, instead of the time of right censoring; etc. I think a correct survival analysis format would be the first step in this scenario.

Best regards,

Marcos
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#4

12 Mar 2018, 02:54

I would like to know how prescription of cardiovascular medications changes over time and map these changes with incidence rate of certain disease (e.g Myocardial infarction).

Supply_data means the date of medication supply.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#5

12 Mar 2018, 08:07

I still do not understand how this data represents the world you are seeking to describe. Let's look at patient id 1. That person apparently has two different observations with supply_date 2003 recorded. Both of them record fibrate prescription, but otherwise they report different medications. Worse, one of these records says that this person has a diagnosis in 2005, but the other doesn't show any diaagnosis at all. And this same person's record in 2002 says he (she?) died in 2006, whereas other records on this patient don't mention death at all. By way of contrast, id 2's records are all in different years, and list both a date of diagnosis and date of death just once, in the first year. So why is information sometimes repeated, and sometimes not within the same person. If the person has two observations for the same year with different medications shown, which one should we believe?
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#6

12 Mar 2018, 20:35

Dear professor Schechter,

I will try to display data in more understandable way.

Sincerely,
Oyun
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#7

12 Mar 2018, 23:32

Date_of_DS=" date of diagnosis of hypertension"
Date_of_MI='date of event_myocardial infacrtion"
date of death="date of death"
clear
input id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex date_of_DS date_of_MI date_of_death first_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 . . 2007 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2005 . . 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 . . 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1999 2009 2009 2006 2009
5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2000 . . 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 1998 2010 . 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2001 . . 2006 2009
8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2002 . . 2007 2010
9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 1996 2008 2008 2006 2008

end

I would like to know how prescription of cardiovascular medications changes over time and map these changes with incidence rate of certain disease (e.g Myocardial infarction).

I would really appreciate may find time to help me solve the problem.

Thank you in advance.
Oyun
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30115

13 Mar 2018, 09:05

This is a very awkward data layout. I think whoever created it for you was looking to sabotage your work :-). To make this functional, it has to be reorganized into a data set with each observation corresponding to one id in one year, and variables indicating the use or non-use of each drug and the occurrence or non-occurrence of each event (hypertension diagnosis, MI diagnosis, or death). It is further complicated by the first and last dates of observation, so that years outside that range have to be excluded.

Once that is done, we can aggregate up (-collapse-) to one observation per year containing the counts of people, drugs, and events. Then, I assume that what you want to graph are rates of drug prevalence and event incidence (as opposed to counts), so we calculate rates and then graph them.

Code:

clear
input id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex date_of_DS date_of_MI date_of_death first_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 . . 2007 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2005 . . 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 . . 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1999 2009 2009 2006 2009
5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2000 . . 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 1998 2010 . 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2001 . . 2006 2009
8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2002 . . 2007 2010
9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 1996 2008 2008 2006 2008

end

//    RE-ORGANIZE DATA INTO A PERSON # YEAR LAYOUT
//    DRUGS AND EVENTS START OUT DIFFERENTLY, SO MUST
//    BE DEALT WITH SEPARATELY AND THEN RE-COMBINED

//        START WITH THE EVENTS
preserve
keep id sex date_of_DS date_of_MI date_of_death first_date last_date
expand last_date-first_date + 1
by id, sort: gen year = first_date + _n - 1
by id (year), sort: assert year[_N] == last_date
drop first_date last_date
gen hypertension_dx = (year == date_of_DS)
gen MI_dx = (year == date_of_MI)
gen death = (year == date_of_death)
drop date_of_*
tempfile events
save `events'

//    NOW RE-ORGANIZE THE DRUG DATA
restore
keep id aspirin* clopidogrel* statin* fibrate*
reshape long aspirin clopidogrel statin fibrate, i(id) j(date)
rename date year

//    PUT THEM TOGETHER
merge 1:1 id year using `events', keep(match using) nogenerate
order sex, after(id)

//    THE DATA ARE NOW ORGANIZED AS ONE OBSERVATION
//    PER PERSON PER YEAR.  STRONGLY RECOMMEND SAVING
//    THIS AS A PERMANENT STATA DATA SET, AS ALMOST
//    ANY ANALYSIS OF THIS DATA WILL REQUIRE GETTING
//    THE DATA INTO THIS SHAPE.
save new_reference_file, replace

//    COLLAPSE TO YEARLY COUNTS
collapse (count) people = id (sum) aspirin-death, by(year)

//    CALCULATE RATES
foreach v of varlist aspirin-death {
    gen `v'_rate = `v'/people
}

//    GRAPH THE EVOLUTION OVER TIME
graph twoway line aspirin_rate-fibrate_rate year, name(drugs, replace) sort
graph twoway line hypertension_dx_rate MI_dx_rate death_rate year, sort name(events, replace)
graph combine drugs events

In the example data there are only a few people and the proportions of drug usage and events are high. In your real data there may be many people and relatively low rates. So you might want to calculate rates per 1,000, or rates per 10,000 instead of the rates per capita that I have done here. Also, I've given bare-bones graphs; use -graph twoway- options to customize their appearance to your taste.

I really do think you should save that data set that arises before the -collapse- command. It is a good layout for this data, and almost any analysis of this data you will want to do will require getting to this layout again. It is much more useful than the organization you are starting from now. If you save this data set, it will be a good starting point for future analysis.

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#9

14 Mar 2018, 01:06

Dear prof. Schechter,

Thank you so much for your continuous help and valuable guide.

May you please also look at my another post.

https://www.statalist.org/forums/for...s-of-adherence

Before doing this analysis I would like to calculate PDC (proportions of day covered) by counting the number of antithrombotic-drug days in a year in order to see adherence.
There are some codes for PDC analysis written in SAS and Python. Unfortunately, I have never used both SAS and Phyton.
https://blog.algorexhealth.com/2017/...on-management/
http://support.sas.com/resources/pap...3/168-2013.pdf

I would really appreciate if you may help me again.

Thank you so much.
Sincerely,
Oyun
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#10

14 Mar 2018, 11:15

Sorry, but I'm afraid I can't help with this one. I stopped using SAS back in the late 1980's. I never liked it and I don't remember much about SAS code, and, in any case, it has evolved since then. I have only dabbled very lightly in Python and don't know it well enough for the purpose. I'm not aware of any existing Stata routines for doing it. So for me to involve myself in this one would mean inventing a Stata solution from scratch. If this were a light period for my regular responsibilities, I might dive into it, but for now I don't have the time for an undertaking like that.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#11

27 Mar 2018, 20:03

Dear prof. Schechter,

May I ask a question regarding first_date and last_date that were used in the following code.
In my real dataset they were generated by medication supply date (date dispensed). But it seems I have to generate new first and last dates using another variable.
What would suggest?

// START WITH THE EVENTS
preserve
keep id sex date_of_DS date_of_MI date_of_death first_date last_date
expand last_date-first_date + 1
by id, sort: gen year = first_date + _n - 1
by id (year), sort: assert year[_N] == last_date
drop first_date last_date
gen hypertension_dx = (year == date_of_DS)
gen MI_dx = (year == date_of_MI)
gen death = (year == date_of_death)
drop date_of_*
tempfile events
save `events'

clear
input id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex date_of_DS date_of_MI date_of_death first_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 . . 2006 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2005 . . 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 2007 . . 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1999 2009 2009 2006 2009
5 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 1 0 0 0 0 2 2000 . . 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 1998 2010 . 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2001 . . 2006 2009
8 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2002 . . 2007 2010
9 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 2 1996 2008 2008 2006 2008

end

Thank you.
Sincerely,
Oyun
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30115
#12

27 Mar 2018, 23:06

But it seems I have to generate new first and last dates using another variable.
What would suggest?

I have no idea what you mean by this. Why do you have to generate new first and last dates? What is wrong with the ones you have? What other variable(s) do you need to use to create them and why?
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#13

28 Mar 2018, 00:56

Dear prof.Schechter,

Sorry for misunderstanding and thank you for your help. I've solved the problem with above question. But currently I am facing another difficulty in calculation.

I am trying to count the number of each-drug (anti-thrombotic and lipid lowering agents) days in a year. I have some commands for calculation for single agent but I'm working on big dataset thus it seems it will take a lot of time if I use these commands.

I would really appreciate if you may help me to create loops for this calculation.

clear
input id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex date_of_DS date_of_MI date_of_death index_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 . . 2007 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2005 . . 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 . . 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1999 2009 2009 2006 2009
5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2000 . . 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 1998 2010 . 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2001 . . 2006 2009
8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2002 . . 2007 2010
9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 1996 2008 2008 2006 2008

end

Commands (only for 1-drug):

*Calculate cumulative days, based on index_date and qty
sort id supp_date
bysort id: gen pat_index = _n
gen date_covered = index_date
format date_covered %td
replace date_covered = date_covered[_n-1] + qty[_n-1] if pat_index>1 & pat_id==pat_id[_n-1]
gen supply_gap = supp_date - date_covered

*Censored date
gen last_follow_date = mdy(12, 31, 2010)
format last_follow_date %td

*6-month assessment
sort id pat_index
gen date_6mth = index_date+ 183
format date_6mth %td
replace date_6mth = date_of_death if date_of_death<date_6mth

gen valid_6mth = .
replace valid_6mth = 1 if supp_date<(index_date+ 183) & date_of_death>(index_date+183)
gen patid_6mth = pat_id if valid_6mth==1

sort id pat_index
gen gap_6mth = .
replace gap_6mth = supply_gap-supply_gap[_n-1] if supply_gap>0 & valid_6mth==1 & supply_gap>supply_gap[_n-1]
replace gap_6mth = supply_gap-0 if supply_gap>0 & valid_6mth==1 & supply_gap>supply_gap[_n-1] & supply_gap[_n-1]<0

bysort patid_6mth: egen days_6mth = sum(gap_6mth)
replace days_6mth = . if patid_6mth==.

bysort patid_6mth: egen qty_6mth = sum(qty)
replace qty_6mth = . if patid_6mth==.

gen overall_gap_6mth = 183-qty_6mth
replace days_6mth = overall_gap_6mth if overall_gap_6mth>days_6mth

gen pdc_6mth = .
replace ndc_6mth = (183-days_6mth)/183
replace ndc_6mth = 0 if pdc<0

gen pdc_6mth_80 =.
replace pdc_6mth_80 =1 if pdc_6mth<=0.79
replace pdc_6mth_80 =2 if pdc_6mth>=0.20 & pdc_6mth>=0.80

Thank you so much.

Oyun
Comment

Announcement