syntax for total duration of drug use only during pregnancy by using dates describing drug use in total

Lisanne Denneman

Join Date: Jun 2014

Posts: 17
#1

syntax for total duration of drug use only during pregnancy by using dates describing drug use in total

03 Oct 2014, 02:12

Dear all,

I have a dataset which contains data about women using medication during pregnancy. For every drug I have the total duration of use and the exact start and stop dates of the drug (there can be 1, 2 or 3 episodes in time for the same drug as the drugs can be stopped and restarted). These dates often include periods of time before and after the pregnancy. Now I want to make a variable that describes the duration of the use ONLY during pregnancy; and I started writing a syntax as below but this doesn't always give me the right result when I check the total number of days manually.

I have the start date of the pregnancy (start_preg) and the date of birth (partus) as variables.
azt is the name of the drug, so azt_start1 is the date on which the drug was started first, an azt_stop1 is the date on which it was stopped for the first time. Then azt_stop3 for example is the 3rd time the drug was started (if this was the case). These different episodes are represented by a, b and c.

Furthermore, option 1 is the option that the drug was started before the pregnancy and it was stopped during pregnancy.
Option 2 is the option that the drug was started in pregnancy (so before the partus) and stopped after pregnancy.
Option 3 is the option that the drug was started in pregnancy and was also stopped in pregnancy.

Code:

gen duration_azt_preg_1a=(azt_stop1-start_preg) if (azt_start1<=start_preg&azt_stop1>start_preg&azt_stop1<=datepartus) gen duration_azt_preg_1b=(azt_stop2-start_preg) if (azt_start2<=start_preg&azt_stop2>start_preg&azt_stop2<=datepartus) gen duration_azt_preg_1c=(azt_stop3-start_preg) if (azt_start3<=start_preg&azt_stop3>start_preg&azt_stop3<=datepartus) gen duration_azt_preg_2a=(datepartus-azt_start1) if (azt_start1<datepartus&azt_start1>=start_preg&azt_stop1>datepartus) gen duration_azt_preg_2b=(datepartus-azt_start2) if (azt_start2<datepartus&azt_start2>=start_preg&azt_stop2>datepartus) gen duration_azt_preg_2c=(datepartus-azt_start3) if (azt_start3<datepartus&azt_start3>=start_preg&azt_stop3>datepartus) gen duration_azt_preg_3a=(azt_stop1-azt_start1) if (azt_start1>=start_preg&azt_stop1<=datepartus) gen duration_azt_preg_3b=(azt_stop1-azt_start2) if (azt_start1>=start_preg&azt_stop2<=datepartus) gen duration_azt_preg_3c=(azt_stop1-azt_start3) if (azt_start1>=start_preg&azt_stop3<=datepartus) egen duration_preg_final=rowtotal(duration_azt_preg_1a duration_azt_preg_1b duration_azt_preg_1c duration_azt_preg_2a

I was wondering if you see a mistake here or if you have ideas about other ways to create a variable that gives me the total duration of drug use only during pregnancy? I am not sure if this question is too specific for this forum but I thought I'd ask your consideration. Please tell me if I have forgotten to give you certain information.

Thanks for your consideration,

Lisanne
Tags: None
Aspen Chen

Join Date: Apr 2014

Posts: 114
#2

03 Oct 2014, 03:02

There seem like quite a few moving pieces in your description. It would help if you can show a few lines of the data so the variables are obvious to read.

Before checking everything else, I spotted a problem with your if conditions. In Stata, multiple relational operators need to be separated with logical operators such as & or |. For example,

Code:

if (a<b<c)

should be separated into

Code:

if (a<b & b<c)
Comment
Lisanne Denneman

Join Date: Jun 2014

Posts: 17
#3

03 Oct 2014, 03:44

Ok, I took a sample of the database with a few examples (I deleted all the variables that were not relevant). It's Excel, is that OK?
Attached Files

example database excel.xls (16.0 KB, 1 view)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35703
#4

03 Oct 2014, 03:48

Lisanne: For advice on Excel, please see http://www.statalist.org/forums/help Section 12.

Many members here can't or won't open MS Word or MS Excel files.

You're using Stata; we're using Stata; that's the basis of the forum, so please show data in a form that everyone can read. Data extracts are easiest to read when copied and pasted within CODE delimiters.
Comment

Lisanne Denneman

Join Date: Jun 2014
Posts: 17

03 Oct 2014, 03:51

Ok,sorry!
Here you go:

Code:

 child_id
datepartus
mother_id
start_preg
azt
azt_start1
azt_stop1
azt_start2
azt_stop2
azt_start3
azt_stop3
dur_azt

M1
15may2006
M27463
21-aug-05
1
10-nov-00
24-sep-05
26-sep-05
15mar2006
15mar2006
16may2006
2011

M2
20-jun-06
M20346
28-sep-05
1
15oct2004
30-dec-05
11-feb-06
15-dec-08


1479

M3
11-sep-07
M34148
22-dec-06
1
4-jun-07
8-jun-07
8-jun-07
15-sep-07


103

M4
3-sep-07
M34099
19-dec-06
1
18-jun-07
4-sep-07




78

M5
20-jul-02
M25543
19oct2001
1
12-jan-01
15-feb-01
22-apr-02
20-jul-02
28-nov-05
16-feb-06
203

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30108
#6

03 Oct 2014, 10:56

First, I think this is easier done in long format, so I'll start with:

Code:

reshape long azt_start azt_stop, i(mother_id child_id) j(seq)

By the way, I'm assuming all the things that look like dates are actually Stata date variables. If not, you'll need to convert them to that.
Also, your sample data don't include a date for the end of the pregnancy. I assume you have that, and it's called end_preg.

Next step is to identify the overlap of the pregnancy and the medication interval

Code:

gen lower = max(azt_start, start_preg) gen upper = min(azt_stop , end_preg) gen in_pregancy_duration = upper-lower replace in_pregnancy_duration = 0 if in_pregnancy_duration < 0

So now you have the in-pregnancy duration for each of the medication episodes. If you now want to sum this up for each pregnancy:

Code:

by mother_id child_id, sort: replace in_pregnancy_duration = sum(in_pregancy_duration) by mother_id child_id, sort: replace in_pregnancy_duration = in_pregnancy_duration[_N]

And if you want your wide format back

Code:

drop lower upper reshape wide azt_start azt_stop, i(mother_id child_id) j(seq)

Not tested, but I think this is right.

One more thought: things will go haywire if all the start dates don't precede their corresponding stop dates. The code assumes that in data cleaning you have already verified this.
1 like
Comment
Aspen Chen

Join Date: Apr 2014

Posts: 114
#7

03 Oct 2014, 14:56

Clyde's solution of using reshape is excellent and should be the standard treatment. One minor problem episodes with missing values are not accounted for. Adding this line right after the line starting with -replace- would address the issue.

Code:

replace in_pregnancy_duration = 0 if missing(azt_start,azt_stop)

@Lisanne Sorry I overlooked on the "&" in your conditions. As for your original design, the option under which the drug started before pregnancy and ended after pregnancy was not considered. Also, Stata treats missing values as greater than any real numbers, so something like

Code:

azt_start2>start_preg

would be considered true even if azt_start2 was missing.

If you prefer to avoid reshaping, here's what I came up with.

Code:

forval i=1/3 { gen duration_azt_preg_`i'=0 replace duration_azt_preg_`i'=datepartus-azt_start`i' if (start_preg<=azt_start`i' & azt_start`i'<=datepartus & datepartus<=azt_stop`i') replace duration_azt_preg_`i'=azt_stop`i'-azt_start`i' if (start_preg<=azt_start`i' & azt_start`i'<=azt_stop`i' & azt_stop`i'<=datepartus) replace duration_azt_preg_`i'=datepartus-start_preg if (azt_start`i'<=start_preg & start_preg<=datepartus & datepartus<=azt_stop`i') replace duration_azt_preg_`i'=azt_stop`i'-start_preg if (azt_start`i'<=start_preg & start_preg<=azt_stop`i' & azt_stop`i'<=datepartus) replace duration_azt_preg_`i'=0 if missing(azt_start`i', azt_stop`i') } egen duration_preg_final=rowtotal(duration_azt_preg_1 duration_azt_preg_2 duration_azt_preg_3)

An additional remark:
I outlined all six scenarios when writing the code. Say a0 and a1 represent the start and end of pregnancy, and b0 and b1 represents the start and end of drug use, here are the possible combinations of different orders as well as the difference representing the overlapping period.

A) a0 b0 a1 b1=>a1-b0
B) a0 b0 b1 a1=>b1-b0
C) b0 a0 a1 b1=>a1-a0
D) b0 a0 b1 a1=>b1-a0
E) a0 a1 b0 b1=>no overlap
F) b0 b1 a0 a1=>no overlap

Note that in scenarios A-D, the starting dates are always the first two values, the ending dates are always the last two values, and the difference is consistently the third minus the second. This stable pattern is why Clyde's smart code works.

Last edited by Aspen Chen; 03 Oct 2014, 15:04.
Comment
Lisanne Denneman

Join Date: Jun 2014

Posts: 17
#8

04 Oct 2014, 04:06

Thank you Clyde and Aspen! I've tried both solutions, Aspens solution works perfectly. Thanks for the explanation!

I tried to run Clydes solution; however I have a few questions:
- First when I reshape the data Stata tells me this: i=mother_id mkindstr does not uniquely identify the observations; there are multiple observations with the same value of mother_id child_id. Type "reshape error" for a listing of the problem observations."
This is not a problem right? I have multiple observations with the same value of mother_id en & child_id because the children visited the clinic multiple times (and some children have the same mother).
- Then when i try to run:
gen lower = max(azt_start, start_preg) this doesn't work because azt_start is an ambiguous abbreviation. So I appended a `*' to the end of the abbreviation (azt_start*) but unfortunately this gives me the error "azt_start* invalid name".
Should I change it to azt_start'i' for example? I haven't used that kind of coding before.

Cheers, Lisanne
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30108
#9

04 Oct 2014, 08:08

So, it seems that the data you posted were not representative of the structure of your data. Within that example, mother_id (and certainly the dyad of mother_id and child_id) did uniquely identify records. It was not clear that there were multiple visits on top of this structure. It is a problem in the sense that the reshape won't run, and without the reshape, the rest of the code won't work (as you discovered). To make it work, you need a visit number variable. With that, you just change the reshape long command to:

Code:

reshape long azt_start azt_stop, i(mother_id child_id visit_num) j(seq)

Then the rest of the code will run. And if you need to get back to wide format at the end, that command also changes:

Code:

reshape wide reshape wide azt_start azt_stop, i(mother_id child_id visit_num) j(seq)
Comment

Announcement

syntax for total duration of drug use only during pregnancy by using dates describing drug use in total

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment