loops for calculation

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#1

loops for calculation

28 Mar 2018, 00:47

[INDENT=4] Dear all,

Currently I am trying to count the number of each-drug (anti-thrombotic and lipid lowering agents) days in a year. I have some commands for calculation for single agent but I'm working on big dataset thus it seems it will take a lot of time if I use these commands.

I would really appreciate if anyone may help me to create loops for this calculation.

clear
input id aspirin2006 clopidogrel2006 statin2006 fibrate2006 aspirin2007 clopidogrel2007 statin2007 fibrate2007 aspirin2008 clopidogrel2008 statin2008 fibrate2008 aspirin2009 clopidogrel2009 statin2009 fibrate2009 aspirin2010 clopidogrel2010 statin2010 fibrate2010 sex date_of_DS date_of_MI date_of_death index_date last_date
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 1 2007 . . 2007 2010
2 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 1 0 0 1 2 2005 . . 2006 2010
3 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 2007 . . 2007 2009
4 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1999 2009 2009 2006 2009
5 0 0 0 1 0 0 0 1 0 0 0 1 0 1 1 1 1 1 1 1 2 2000 . . 2007 2009
6 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 2 1998 2010 . 2006 2010
7 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 2001 . . 2006 2009
8 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 2002 . . 2007 2010
9 0 0 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 0 0 2 1996 2008 2008 2006 2008

end

Commands (only for 1-drug):

*Calculate cumulative days, based on index_date and qty
sort id supp_date
bysort id: gen pat_index = _n
gen date_covered = index_date
format date_covered %td
replace date_covered = date_covered[_n-1] + qty[_n-1] if pat_index>1 & pat_id==pat_id[_n-1]
gen supply_gap = supp_date - date_covered

*Censored date
gen last_follow_date = mdy(12, 31, 2010)
format last_follow_date %td

*6-month assessment
sort id pat_index
gen date_6mth = index_date+ 183
format date_6mth %td
replace date_6mth = date_of_death if date_of_death<date_6mth

gen valid_6mth = .
replace valid_6mth = 1 if supp_date<(index_date+ 183) & date_of_death>(index_date+183)
gen patid_6mth = pat_id if valid_6mth==1

sort id pat_index
gen gap_6mth = .
replace gap_6mth = supply_gap-supply_gap[_n-1] if supply_gap>0 & valid_6mth==1 & supply_gap>supply_gap[_n-1]
replace gap_6mth = supply_gap-0 if supply_gap>0 & valid_6mth==1 & supply_gap>supply_gap[_n-1] & supply_gap[_n-1]<0

bysort patid_6mth: egen days_6mth = sum(gap_6mth)
replace days_6mth = . if patid_6mth==.

bysort patid_6mth: egen qty_6mth = sum(qty)
replace qty_6mth = . if patid_6mth==.

gen overall_gap_6mth = 183-qty_6mth
replace days_6mth = overall_gap_6mth if overall_gap_6mth>days_6mth

gen pdc_6mth = .
replace ndc_6mth = (183-days_6mth)/183
replace ndc_6mth = 0 if pdc<0

gen pdc_6mth_80 =.
replace pdc_6mth_80 =1 if pdc_6mth<=0.79
replace pdc_6mth_80 =2 if pdc_6mth>=0.20 & pdc_6mth>=0.80

Thank you so much.

Oyun

Quote

Flag

Like
0

[/INDENT]
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

31 Mar 2018, 12:31

Your question is unclear in a couple of respects. I guess you want to do this separately for each drug. Do you also want to do it separately for each calendar year?

The sample code you show refers to some variables that do not exist in your sample data, so it isn't possible to proceed from here. We need the sample code to work with the sample data.

Your data layout is not really suitable for this task. In earlier threads involving data similar to this (perhaps even the same) you have been shown how to reorganize it in a useful way, as a long layout. Before reposting, I suggest you do that much so that whoever helps you can at least start from there. The more you do to help those who want to help you, the greater your chances of getting the help you want in a timely manner.

Happy Easter to you as well!
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#3

01 Apr 2018, 01:35

Dear prof. Schechter,

I would like to calculate number of days covered for each-drug and each year. But If I will use the above commands first I have to reorganize my data by keeping only one-drug (for example: keep statin only), and then repeat the analysis more than 10 times (for each drug and for each file). It will take a lot of time and might create bias in analysis. Thus I thought maybe loops may help me to do these analysis at one time for all drugs.

I tried to reorganize example data as suggested and added variable "qty-quantity (number of dispensed drug). If this example is not appropriate for analysis I am ready to correct them again at any time.

Thank you so much.
Sincerely,

input byte(id aspirin clopidogrel heparin lipid_lowering fibrate qty) str6(supply_date index_date death_year)
1 1 0 0 0 0 28 "Apr-02" "Apr-02" .
1 1 0 0 0 0 30 "Jul-02" "Apr-02" .
1 1 0 1 0 0 56 "May-03" "Apr-02" .
1 1 0 0 0 0 60 "Sep-06" "Apr-02" .
2 1 1 0 0 0 28 "Feb-03" "Feb-03" .
2 0 1 0 0 0 30 "Feb-06" "Feb-03" "Dec-09"
2 0 1 0 1 0 90 "Mar-03" "Feb-03"
3 0 0 0 0 1 60 "Jan-06" "Jan-06" .
3 1 0 0 0 0 56 "Feb-06" "Jan-06" .
3 1 1 0 0 0 56 "Nov-08" "Jan-06" "Sep-10"
3 1 1 0 0 0 30 "Sep-08" "Jan-06" .
3 0 0 0 1 0 28 "Feb-09" "Jan-06" .
4 0 1 0 0 1 28 "Jun-04" "Jun-04" .
4 0 0 0 0 1 30 "Jun-06" "Jun-04" "May-09"
5 0 1 0 0 0 90 "May-03" "May-03" .
5 0 1 0 0 1 100 "Sep-03" "May-03" .
5 0 1 0 1 0 56 "Aug-04" "May-03" "Jun-10"
end

Last edited by Buyadaa Oyunchimeg; 01 Apr 2018, 01:43.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#4

01 Apr 2018, 09:42

The data you supply is not compatible with the code you show.

It refers to a variable supp_date, but the closest thing to that in you data is something called supply_date. Also your date variables are strings, but they need to be actual Stata internal format date variables. The code revers to a variable id, and also to a variable pat_id. I don't know if those are intended to be the same thing or not, but only id exists in your data. Your code refers to a variable date_of_death, but there is no such variable in your data: just something called date_year, which may or may not be the same thing. Your code refers to a variable ndc_6mth: I don't see anything even close to that in the data.

So you need to either change the code so it conforms to the data, or change the data so it conforms to the code. I don't grasp the logic of the code, so I'm not going to attempt to do that for you. If you fix it up so that it runs correctly for a single drug in a single patient, I will be happy to show you how to get it working for the full range of drugs and all the patients.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#5

01 Apr 2018, 18:37

Dear prof.Schechter,

Thank you for correcting me.
As suggested I did some changes in my example data.

If it is still not appropriate for analysis, I am ready to correct them. Hope this time it will work.

Thank you so much.
Sincerely

clear
input byte(pat_id aspirin clopidogrel heparin statin fibrate qty) float(supp_date index_date date_of_death)
1 1 0 0 0 0 28 507 507 .
1 1 0 0 0 0 30 510 507 .
1 1 0 1 0 0 56 520 507 .
1 1 0 0 0 0 60 560 507 .
2 1 1 0 0 0 28 517 517 .
2 0 1 0 1 0 90 518 517 .
2 0 1 0 0 0 30 553 517 599
3 0 0 0 0 1 60 552 552 .
3 1 0 0 0 0 56 553 552 .
3 1 1 0 0 0 30 584 552 .
3 1 1 0 0 0 56 586 552 608
3 0 0 0 1 0 28 589 552 .
4 0 1 0 0 1 28 533 533 .
4 0 0 0 0 1 30 557 533 592
5 0 1 0 0 0 90 520 520 .
5 0 1 0 0 1 100 524 520 .
5 0 1 0 1 0 56 535 520 605
end
format %tm supp_date
format %tm index_date
format %tm date_of_death

Commands (calculation of proportion of days covered (PDC)):

*Calculate cumulative days, based on index_date and qty
sort pat_id supp_date
bysort pat_id: gen pat_index = _n
gen date_covered = index_date
format date_covered %td
replace date_covered = date_covered[_n-1] + qty[_n-1] if pat_index>1 & pat_id==pat_id[_n-1]
gen supply_gap = supp_date - date_covered

*Censored date
gen last_follow_date = mdy(12, 31, 2010)
format last_follow_date %td

*12-month assessment
sort ppn pat_index
gen date_12mth = index_date+ 365
format date_12mth %td
replace date_12mth = date_of_death if date_of_death<date_12mth

gen valid_12mth = .
replace valid_12mth = 1 if supp_date<(index_date+ 365) & date_of_death>(index_date+365)
gen patid_12mth = pat_id if valid_12mth==1

sort pat_id pat_index
gen gap_12mth = .
replace gap_12mth = supply_gap-supply_gap[_n-1] if supply_gap>0 & valid_12mth==1 & supply_gap>supply_gap[_n-1]
replace gap_12mth = supply_gap-0 if supply_gap>0 & valid_12mth==1 & supply_gap>supply_gap[_n-1] & supply_gap[_n-1]<0

bysort patid_12mth: egen days_12mth = sum(gap_12mth)
replace days_12mth = . if patid_12mth==.

bysort patid_12mth: egen qty_12mth = sum(qty)
replace qty_12mth = . if patid_12mth==.

gen overall_gap_12mth = 365-qty_12mth
replace days_12mth = overall_gap_12mth if overall_gap_12mth>days_12mth

gen pdc_12mth = .
replace pdc_12mth = (365-days_12mth)/365
replace pdc_12mth = 0 if pdc<0

gen pdc_12mth_80 =.
replace pdc_12mth_80 =1 if pdc_12mth<=0.79
replace pdc_12mth_80 =2 if pdc_12mth>=0.20 & pdc_12mth>=0.80

Last edited by Buyadaa Oyunchimeg; 01 Apr 2018, 18:42.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#6

01 Apr 2018, 19:45

Still not right. Your code refers to a variable ppn, but there is no variable like that in the data.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#7

01 Apr 2018, 19:52

I am really sorry professor. ppn refers to pat_id.

I've corrected it.

*Calculate cumulative days, based on index_date and qty
sort pat_id supp_date
bysort pat_id: gen pat_index = _n
gen date_covered = index_date
format date_covered %td
replace date_covered = date_covered[_n-1] + qty[_n-1] if pat_index>1 & pat_id==pat_id[_n-1]
gen supply_gap = supp_date - date_covered

*Censored date
gen last_follow_date = mdy(12, 31, 2010)
format last_follow_date %td

*12-month assessment
sort pat_id pat_index
gen date_12mth = index_date+ 365
format date_12mth %td
replace date_12mth = date_of_death if date_of_death<date_12mth

gen valid_12mth = .
replace valid_12mth = 1 if supp_date<(index_date+ 365) & date_of_death>(index_date+365)
gen patid_12mth = pat_id if valid_12mth==1

sort pat_id pat_index
gen gap_12mth = .
replace gap_12mth = supply_gap-supply_gap[_n-1] if supply_gap>0 & valid_12mth==1 & supply_gap>supply_gap[_n-1]
replace gap_12mth = supply_gap-0 if supply_gap>0 & valid_12mth==1 & supply_gap>supply_gap[_n-1] & supply_gap[_n-1]<0

bysort patid_12mth: egen days_12mth = sum(gap_12mth)
replace days_12mth = . if patid_12mth==.

bysort patid_12mth: egen qty_12mth = sum(qty)
replace qty_12mth = . if patid_12mth==.

gen overall_gap_12mth = 365-qty_12mth
replace days_12mth = overall_gap_12mth if overall_gap_12mth>days_12mth

gen pdc_12mth = .
replace pdc_12mth = (365-days_12mth)/365
replace pdc_12mth = 0 if pdc<0

gen pdc_12mth_80 =.
replace pdc_12mth_80 =1 if pdc_12mth<=0.79
replace pdc_12mth_80 =2 if pdc_12mth>=0.20 & pdc_12mth>=0.80
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

01 Apr 2018, 20:33

OK. This is better. Now, I don't really grasp what the code is doing, but I'm making the following assumptions:

1. The variables aspirin through fibrate are coded 0/1 meaning that in the time period referred to by the dates in that observation the patient in that observation does not (0) or does (1) have a prescription for that drug.

2. The observations are uniquely identified by the combination of pat_id and supp_date. (This is true in your example data.)

On that understanding, all that is needed is to transform the data into a fully long layout where drug is a separate variable, and then eliminate the resulting observations where there is no such prescription, and then loop over the patients and drugs. Instead of explicit looping, it is easier in this case to wrap the code inside a program and then iterate it with -runby-.

Code:

clear
input byte(pat_id aspirin clopidogrel heparin statin fibrate qty) float(supp_date index_date date_of_death)
1 1 0 0 0 0 28 507 507 .
1 1 0 0 0 0 30 510 507 .
1 1 0 1 0 0 56 520 507 .
1 1 0 0 0 0 60 560 507 .
2 1 1 0 0 0 28 517 517 .
2 0 1 0 1 0 90 518 517 .
2 0 1 0 0 0 30 553 517 599
3 0 0 0 0 1 60 552 552 .
3 1 0 0 0 0 56 553 552 .
3 1 1 0 0 0 30 584 552 .
3 1 1 0 0 0 56 586 552 608
3 0 0 0 1 0 28 589 552 .
4 0 1 0 0 1 28 533 533 .
4 0 0 0 0 1 30 557 533 592
5 0 1 0 0 0 90 520 520 .
5 0 1 0 0 1 100 524 520 .
5 0 1 0 1 0 56 535 520 605
end
format %tm supp_date
format %tm index_date
format %tm date_of_death

capture program drop once
program define once
    *Calculate cumulative days, based on index_date and qty
    sort pat_id supp_date
    bysort pat_id: gen pat_index = _n
    gen date_covered = index_date
    format date_covered %td
    replace date_covered = date_covered[_n-1] + qty[_n-1] if pat_index>1 & pat_id==pat_id[_n-1]
    gen supply_gap = supp_date - date_covered


    *Censored date
    gen last_follow_date = mdy(12, 31, 2010)
    format last_follow_date %td

    *12-month assessment
    sort pat_id pat_index
    gen date_12mth = index_date+ 365
    format date_12mth %td
    replace date_12mth = date_of_death if date_of_death<date_12mth

    gen valid_12mth = .
    replace valid_12mth = 1 if supp_date<(index_date+ 365) & date_of_death>(index_date+365)
    gen patid_12mth = pat_id if valid_12mth==1

    sort pat_id pat_index
    gen gap_12mth = .
    replace gap_12mth = supply_gap-supply_gap[_n-1] if supply_gap>0 & valid_12mth==1 & supply_gap>supply_gap[_n-1]
    replace gap_12mth = supply_gap-0 if supply_gap>0 & valid_12mth==1 & supply_gap>supply_gap[_n-1] & supply_gap[_n-1]<0

    bysort patid_12mth: egen days_12mth = sum(gap_12mth)
    replace days_12mth = . if patid_12mth==.

    bysort patid_12mth: egen qty_12mth = sum(qty)
    replace qty_12mth = . if patid_12mth==.

    gen overall_gap_12mth = 365-qty_12mth
    replace days_12mth = overall_gap_12mth if overall_gap_12mth>days_12mth

    gen pdc_12mth = .
    replace pdc_12mth = (365-days_12mth)/365
    replace pdc_12mth = 0 if pdc<0

    gen pdc_12mth_80 =.
    replace pdc_12mth_80 =1 if pdc_12mth<=0.79
    replace pdc_12mth_80 =2 if pdc_12mth>=0.20 & pdc_12mth>=0.80    
    exit
end

rename (aspirin-fibrate) yesno=
reshape long yesno, i(pat_id supp_date) j(drug) string
keep if yesno
drop yesno

runby once, by(pat_id drug)

-runby- is written by Robert Picard and me, and it is available from SSC.

Notice that in this instance, the code inside program once is exactly the code you provided in #7, put there by copy/paste with no alterations whatsoever.

Comment

Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#9

01 Apr 2018, 20:45

Thank you so much professor.
I really appreciate your help.

Sincerely,
Oyun
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#10

03 Apr 2018, 02:52

Dear prof. Schechter,

I’ve tried the loops in real dataset and got a problem with reshaping. I tried it in many different ways but STATA saying “id is not uniquely identify the observations”. I am wondering what might cause the problem.

Is there anyway to solve the problem?

Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#11

03 Apr 2018, 09:48

Please post an example of the data set as it looks just before the -reshape- command, using -dataex, of course. And also show the exact -reshape- command you issued.

If you were using the code that I gave you in #8, you could not possibly get that error message, because in that code the -i()- option is not just id. So you have modified the code in some way that is, apparently, not consistent with the data you have.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#12

03 Apr 2018, 19:49

This is maybe because of duplicates in terms of id, supply date.
Please see below.

clear
input byte(pat_id yesnoaspirin yesnoclopidogrel yesnoheparin yesnostatin yesnofibrate qty) float(supp_date index_date date_of_death)
1 1 0 0 0 0 28 507 507 .
1 1 0 0 0 0 30 507 507 .
1 1 0 0 0 0 56 507 507 .
1 1 0 0 0 0 56 507 507 .
1 1 0 0 0 0 56 507 507 .
1 0 1 0 0 0 60 518 507 .
2 0 1 0 0 0 28 517 517 .
2 0 1 0 0 0 28 517 517 .
2 0 1 0 0 0 28 517 517 .
2 0 1 0 0 0 28 517 517 .
2 0 0 0 1 0 90 517 517 .
2 0 1 0 0 0 30 517 517 599
3 0 0 0 0 1 60 552 552 .
3 1 0 0 0 0 56 553 552 .
3 1 0 0 0 0 56 553 552 .
3 1 0 0 0 0 56 553 552 .
3 1 0 0 0 0 56 553 552 .
3 1 1 0 0 0 30 584 552 .
3 1 1 0 0 0 56 586 552 608
3 0 0 0 1 0 28 589 552 .
4 0 1 0 0 1 28 533 533 .
4 0 0 0 0 1 30 557 533 592
5 0 1 0 0 0 90 520 520 .
5 0 1 0 0 1 100 524 520 .
5 0 1 0 1 0 56 535 520 605
end
format %tm supp_date
format %tm index_date
format %tm date_of_death
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#13

03 Apr 2018, 20:06

OK, this data does not make any sense to me. First of all, there are many observations that are complete duplicates of each other. Why is that? Are there other variables you do not show for which these observations differ?

Even after dropping observations that are complete duplicates of each other from your data, there remain observations that are duplicates for pat_id and supp_date that I cannot make sense of. FOr example, pat_id1 has three observations all with supp_date 2002m4 and index_date 2002m4, and all of them are just for aspirin. They differ in quantity. So that seems to say that this patient was dispensed three different bottles of aspirin all on 2002m4, one with 28 tablets, 1 with 30, and another with 56. Is that right? Why would that happen? What does it mean. Why is there not just one observatoin for this showing 28+30+56 = 114 tablets? Similarly pat_id 2 with supply and index dates 2003m2 has 3 observations, and 2 of them are for the same drug, clopidigrel, one having qty 28 and the other qty 30. Again, I don't understand why this isn't just a single record with qty = 58. I know how to reduce these redundancies to single records, but I don't want to just blindly do it because I really don't understand why these are there. There may be some reason for them to be there that I am missing. Or they may represent data management errors preceding this point that you need to fix.
Comment
Buyadaa Oyunchimeg

Join Date: Jan 2018

Posts: 195
#14

03 Apr 2018, 20:42

There are other variables such as item code and prescription date but they do not differ in these observations. It maybe because for example GP gives prescriptions for 6 months for one medication (for example Aspirin) and then patient buys medication from different pharmacies, this may create these duplicates in terms of prescription date, supply date, patient ID and item code. I tried to drop them but the numbers are too big so maybe it is better to make them as one observation. What would suggest and how to reduce these duplicates to single records?

Last edited by Buyadaa Oyunchimeg; 03 Apr 2018, 20:47.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#15

03 Apr 2018, 22:22

I'm not really comfortable with my understanding of this data, and I fear I will lead you down the wrong path. But my hunch is that you want to do:

Code:

gen obs_no = _n reshape long yesno, i(obs_no) j(drug) string drop if yesno == 0 collapse (sum) qty (max) date_of_death, by(pat_id drug supp_date index_date) runby once, by(pat_id drug)

from the point where the data look like what you show in #12.
Comment

Announcement