Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • syntax for total duration of drug use only during pregnancy by using dates describing drug use in total

    Dear all,

    I have a dataset which contains data about women using medication during pregnancy. For every drug I have the total duration of use and the exact start and stop dates of the drug (there can be 1, 2 or 3 episodes in time for the same drug as the drugs can be stopped and restarted). These dates often include periods of time before and after the pregnancy. Now I want to make a variable that describes the duration of the use ONLY during pregnancy; and I started writing a syntax as below but this doesn't always give me the right result when I check the total number of days manually.

    I have the start date of the pregnancy (start_preg) and the date of birth (partus) as variables.
    azt is the name of the drug, so azt_start1 is the date on which the drug was started first, an azt_stop1 is the date on which it was stopped for the first time. Then azt_stop3 for example is the 3rd time the drug was started (if this was the case). These different episodes are represented by a, b and c.

    Furthermore, option 1 is the option that the drug was started before the pregnancy and it was stopped during pregnancy.
    Option 2 is the option that the drug was started in pregnancy (so before the partus) and stopped after pregnancy.
    Option 3 is the option that the drug was started in pregnancy and was also stopped in pregnancy.

    Code:
    gen duration_azt_preg_1a=(azt_stop1-start_preg) if (azt_start1<=start_preg&azt_stop1>start_preg&azt_stop1<=datepartus)
    gen duration_azt_preg_1b=(azt_stop2-start_preg) if (azt_start2<=start_preg&azt_stop2>start_preg&azt_stop2<=datepartus)
    gen duration_azt_preg_1c=(azt_stop3-start_preg) if (azt_start3<=start_preg&azt_stop3>start_preg&azt_stop3<=datepartus)
     
    gen duration_azt_preg_2a=(datepartus-azt_start1) if (azt_start1<datepartus&azt_start1>=start_preg&azt_stop1>datepartus)
    gen duration_azt_preg_2b=(datepartus-azt_start2) if (azt_start2<datepartus&azt_start2>=start_preg&azt_stop2>datepartus)
    gen duration_azt_preg_2c=(datepartus-azt_start3) if (azt_start3<datepartus&azt_start3>=start_preg&azt_stop3>datepartus)
     
    gen duration_azt_preg_3a=(azt_stop1-azt_start1) if (azt_start1>=start_preg&azt_stop1<=datepartus)
    gen duration_azt_preg_3b=(azt_stop1-azt_start2) if (azt_start1>=start_preg&azt_stop2<=datepartus)
    gen duration_azt_preg_3c=(azt_stop1-azt_start3) if (azt_start1>=start_preg&azt_stop3<=datepartus)
     
    egen duration_preg_final=rowtotal(duration_azt_preg_1a duration_azt_preg_1b duration_azt_preg_1c duration_azt_preg_2a
    I was wondering if you see a mistake here or if you have ideas about other ways to create a variable that gives me the total duration of drug use only during pregnancy? I am not sure if this question is too specific for this forum but I thought I'd ask your consideration. Please tell me if I have forgotten to give you certain information.

    Thanks for your consideration,

    Lisanne

  • #2
    There seem like quite a few moving pieces in your description. It would help if you can show a few lines of the data so the variables are obvious to read.

    Before checking everything else, I spotted a problem with your if conditions. In Stata, multiple relational operators need to be separated with logical operators such as & or |. For example,
    Code:
    if (a<b<c)
    should be separated into
    Code:
    if (a<b & b<c)

    Comment


    • #3
      Ok, I took a sample of the database with a few examples (I deleted all the variables that were not relevant). It's Excel, is that OK?
      Attached Files

      Comment


      • #4
        Lisanne: For advice on Excel, please see http://www.statalist.org/forums/help Section 12.

        Many members here can't or won't open MS Word or MS Excel files.

        You're using Stata; we're using Stata; that's the basis of the forum, so please show data in a form that everyone can read. Data extracts are easiest to read when copied and pasted within CODE delimiters.

        Comment


        • #5
          Ok,sorry!
          Here you go:

          Code:
           
          child_id datepartus mother_id start_preg azt azt_start1 azt_stop1 azt_start2 azt_stop2 azt_start3 azt_stop3 dur_azt
          M1 15may2006 M27463 21-aug-05 1 10-nov-00 24-sep-05 26-sep-05 15mar2006 15mar2006 16may2006 2011
          M2 20-jun-06 M20346 28-sep-05 1 15oct2004 30-dec-05 11-feb-06 15-dec-08 1479
          M3 11-sep-07 M34148 22-dec-06 1 4-jun-07 8-jun-07 8-jun-07 15-sep-07 103
          M4 3-sep-07 M34099 19-dec-06 1 18-jun-07 4-sep-07 78
          M5 20-jul-02 M25543 19oct2001 1 12-jan-01 15-feb-01 22-apr-02 20-jul-02 28-nov-05 16-feb-06 203

          Comment


          • #6
            First, I think this is easier done in long format, so I'll start with:

            Code:
            reshape long azt_start azt_stop, i(mother_id child_id) j(seq)
            By the way, I'm assuming all the things that look like dates are actually Stata date variables. If not, you'll need to convert them to that.
            Also, your sample data don't include a date for the end of the pregnancy. I assume you have that, and it's called end_preg.

            Next step is to identify the overlap of the pregnancy and the medication interval
            Code:
            gen lower = max(azt_start, start_preg)
            gen upper = min(azt_stop , end_preg)
            gen in_pregancy_duration = upper-lower
            replace in_pregnancy_duration = 0 if in_pregnancy_duration < 0
            So now you have the in-pregnancy duration for each of the medication episodes. If you now want to sum this up for each pregnancy:

            Code:
            by mother_id child_id, sort: replace in_pregnancy_duration = sum(in_pregancy_duration)
            by mother_id child_id, sort: replace in_pregnancy_duration = in_pregnancy_duration[_N]
            And if you want your wide format back

            Code:
            drop lower upper
            reshape wide azt_start azt_stop, i(mother_id child_id) j(seq)
            Not tested, but I think this is right.

            One more thought: things will go haywire if all the start dates don't precede their corresponding stop dates. The code assumes that in data cleaning you have already verified this.

            Comment


            • #7
              Clyde's solution of using reshape is excellent and should be the standard treatment. One minor problem episodes with missing values are not accounted for. Adding this line right after the line starting with -replace- would address the issue.
              Code:
              replace in_pregnancy_duration = 0 if missing(azt_start,azt_stop)
              @Lisanne Sorry I overlooked on the "&" in your conditions. As for your original design, the option under which the drug started before pregnancy and ended after pregnancy was not considered. Also, Stata treats missing values as greater than any real numbers, so something like
              Code:
              azt_start2>start_preg
              would be considered true even if azt_start2 was missing.

              If you prefer to avoid reshaping, here's what I came up with.
              Code:
              forval i=1/3    {
                  gen duration_azt_preg_`i'=0
                  replace duration_azt_preg_`i'=datepartus-azt_start`i' if (start_preg<=azt_start`i' & azt_start`i'<=datepartus & datepartus<=azt_stop`i')
                  replace duration_azt_preg_`i'=azt_stop`i'-azt_start`i' if (start_preg<=azt_start`i' & azt_start`i'<=azt_stop`i' & azt_stop`i'<=datepartus)
                  replace duration_azt_preg_`i'=datepartus-start_preg if (azt_start`i'<=start_preg & start_preg<=datepartus & datepartus<=azt_stop`i')
                  replace duration_azt_preg_`i'=azt_stop`i'-start_preg if (azt_start`i'<=start_preg & start_preg<=azt_stop`i' & azt_stop`i'<=datepartus)
                  replace duration_azt_preg_`i'=0 if missing(azt_start`i', azt_stop`i')
              }
              egen duration_preg_final=rowtotal(duration_azt_preg_1 duration_azt_preg_2 duration_azt_preg_3)
              An additional remark:
              I outlined all six scenarios when writing the code. Say a0 and a1 represent the start and end of pregnancy, and b0 and b1 represents the start and end of drug use, here are the possible combinations of different orders as well as the difference representing the overlapping period.

              A) a0 b0 a1 b1=>a1-b0
              B) a0 b0 b1 a1=>b1-b0
              C) b0 a0 a1 b1=>a1-a0
              D) b0 a0 b1 a1=>b1-a0
              E) a0 a1 b0 b1=>no overlap
              F) b0 b1 a0 a1=>no overlap

              Note that in scenarios A-D, the starting dates are always the first two values, the ending dates are always the last two values, and the difference is consistently the third minus the second. This stable pattern is why Clyde's smart code works.
              Last edited by Aspen Chen; 03 Oct 2014, 15:04.

              Comment


              • #8
                Thank you Clyde and Aspen! I've tried both solutions, Aspens solution works perfectly. Thanks for the explanation!

                I tried to run Clydes solution; however I have a few questions:
                - First when I reshape the data Stata tells me this: i=mother_id mkindstr does not uniquely identify the observations; there are multiple observations with the same value of mother_id child_id. Type "reshape error" for a listing of the problem observations."
                This is not a problem right? I have multiple observations with the same value of mother_id en & child_id because the children visited the clinic multiple times (and some children have the same mother).
                - Then when i try to run:
                gen lower = max(azt_start, start_preg) this doesn't work because azt_start is an ambiguous abbreviation. So I appended a `*' to the end of the abbreviation (azt_start*) but unfortunately this gives me the error "azt_start* invalid name".
                Should I change it to azt_start'i' for example? I haven't used that kind of coding before.

                Cheers, Lisanne

                Comment


                • #9
                  So, it seems that the data you posted were not representative of the structure of your data. Within that example, mother_id (and certainly the dyad of mother_id and child_id) did uniquely identify records. It was not clear that there were multiple visits on top of this structure. It is a problem in the sense that the reshape won't run, and without the reshape, the rest of the code won't work (as you discovered). To make it work, you need a visit number variable. With that, you just change the reshape long command to:

                  Code:
                  reshape long azt_start azt_stop, i(mother_id child_id visit_num) j(seq)
                  Then the rest of the code will run. And if you need to get back to wide format at the end, that command also changes:

                  Code:
                  reshape wide
                  reshape wide azt_start azt_stop, i(mother_id child_id visit_num) j(seq)

                  Comment

                  Working...
                  X