Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • working with dates - keeping observations between dates and grouping accordingly

    Dear Listers

    I have a data-set that can be illustrated with the following: id, date_of_vaccination, date_of_infections.

    Now in a self-controlled-case-series. I would like to count the number of infections 1 year prior to vaccinations and 1year after the vaccinations.
    So my question is two fold.

    1) how do i best keep observations in the window of interest?
    2) is there a way that i can pr id generate a pre/post grouping variable based on the date of vaccination?

    I guess I need to stress that the date_of_vaccination is different for each id.

    Hope i made my self somewhat understandable.

    Best,
    Lars

  • #2
    The description of your problem seems clear enough. But I can imagine several different ways your data might be organized, and they would require different programs to accomplish your task. I think that as a very general rule, with only exquisitely rare exceptions, when people want help with code or algorithms you have to show an example of the data you are working with. If you post an example of your data, using the -dataex- command to assure that it can be faithfully replicated by anybody who wants to help you, I predict that a solution would be rapidly forthcoming. Without example data, the hardest part of solving your problem is reading your mind.

    Comment


    • #3
      Dear Clyde
      i know i know. I was trying to cut corners.
      I generate a sample set asap.
      L

      Comment


      • #4
        Dear Listers

        The following is a mock-up of my data. The data is a bit more complex, but the example answers most of my current hastles.
        All dates are fictive.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(id date_of_vaccination date_of_infections)
        1     . 13700
        1 13789     .
        1     . 14050
        2     . 19001
        2 19204     .
        2     . 19300
        3     . 19500
        3 20101     .
        3     . 20102
        3     . 20305
        4     . 12500
        4     . 12606
        4     . 12900
        4 13002     .
        4     . 13050
        5 20100     .
        5     . 20300
        5     . 20345
        5     . 20399
        5     . 21000
        end
        format %d vac_date
        format %d inf_date

        I would like to count the number of infections 1 year prior to vaccinations and 1year after the vaccinations.
        So my question is three fold.

        1) how do i best keep observations in the window of interest?
        2) is there a way that i can pr id generate a pre/post grouping variable based on the date of vaccination?
        3) if there is no pre events, than i would like there to be a pre grouping variable non the less, but with missing date_of_infection - so that i can count this as a zero events.

        I guess I need to stress that the date_of_vaccination is different for each id.

        Hope i made my self somewhat understandable.

        Thanks again.

        Lars

        Comment


        • #5
          The main obstacle is that your data layout is difficult to work with. The whole thing becomes very simple if we re-arrange the data sot that you have a single date variable, and another date variable that identifies each observation as being either a vaccination or an infection. Once we have that, the commands are very straightforward.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(id date_of_vaccination date_of_infections)
          1 . 13700
          1 13789 .
          1 . 14050
          2 . 19001
          2 19204 .
          2 . 19300
          3 . 19500
          3 20101 .
          3 . 20102
          3 . 20305
          4 . 12500
          4 . 12606
          4 . 12900
          4 13002 .
          4 . 13050
          5 20100 .
          5 . 20300
          5 . 20345
          5 . 20399
          5 . 21000
          end
          format %d date*
          
          // VERIFY ANY GIVEN OBS. IS FOR EITHER A VACCINATION
          // OR AN INFECTION, BUT NOT BOTH
          assert missing(date_of_vaccination, date_of_infections)
          
          // VERIFY AT MOST ONE VACCINATION DATE PER ID
          by id, sort: egen vacc_count = total(!missing(date_of_vaccination))
          assert vacc_count <= 1
          
          // COMBINE DATES INTO A SINGLE SEQUENCE
          gen date = cond(missing(date_of_vaccination), date_of_infections, date_of_vaccination)
          format date %td
          gen event = cond(missing(date_of_vaccination), "infection", "vaccination")
          
          // KEEP OBSERVATIONS THAT ARE WITHIN A YEAR OF VACCINATION
          by id (date_of_vaccination), sort: replace date_of_vaccination = date_of_vaccination[1]
          keep if inrange(abs(date-date_of_vaccination), -365, 365)
          
          // GENERATE PREPOST VARIABLE
          gen byte pre_post = (date > date_of_vaccination)
          label define pre_post 0 "Pre" 1 "Post"
          label values pre_post pre_post
          
          // GENERATE COUNTS OF PRE- AND POST VACCINATION INFECTIONS
          by id, sort: egen pre_infections = total(event == "infection" & pre_post == 0)
          by id, sort: egen post_infections = total(event == "infection" & pre_post == 1)
          
          isid id date, sort
          You do not need to create an artificial "pre" group when there are no pre-vaccination infections: the command that counts events knows that the count of the empty set is zero.





          Comment


          • #6
            Dear Clyde Schechter
            Firstly thank you for this brilliant code.
            I have two quick questions, that I hope you can help me with.
            1) What if vaccinations were repeated events?
            Here is a primitive exemple.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input float(id date_of_vaccination date_of_infections)
            1 . 13700
            1 13789 .
            1 . 14050
            2 . 19001
            2 19204 .
            2 . 19300
            3 . 19500
            3 20101 .
            3 20200 . 
            3 . 20102
            3 . 20305
            4 . 12500
            4 . 12606
            4 . 12900
            4 13002 .
            4 . 13050
            5 20100 .
            5 . 20300
            5 . 20345
            5 . 20399
            5 . 21000
            end
            format %d date*
            2) I want to calculate the IRR of pre vaccination infektions vs post vaccination events, knowing that there are repeated exposures here.
            I have to stress that this exemple is completly made up. I should have said travel abroad or something - as that would have been a better example.

            I though of doing Poisson regression but i struggle to generate an exposure variable. Because that will be 365.25 days prior to each vaccination and 365 post pr ID.
            I was wondering i you could help with the next steps in my analysis here.

            Thank you.
            Lars

            Comment


            • #7
              While it doesn't occur in your example data, it is unclear what to do in the event that a person has an infection during the time between two vaccinations. Would that be "pre" the second vaccination or "post" the first one? And if an infection occurred more than a year after the first of two vaccinations, but within a year of the second, how do we handle that? Similarly, what if an infection occurs more than a year before the first of two vaccination, but within a year of the second? There may be other possibilities that I have not enumerated here that make it unclear what is wanted.

              You need to clarify how you want to handle all these situations in order to proceed.

              Comment


              • #8
                Dear Clyde
                This is very important questions, that needs to be clarified.
                Firstly - In my real dataset - i have dropped the last observations if there is not a minimum of 365 days after the first event and at least 365 days prior to the second event. Thus their will need to be 731 days between events for me to include the second event in the analysis.

                Now for your questions.
                1= if a has an infection between two vaccinations - it will be a "post" event if it is within the first 365 days after vaccination and a "pre" if it is within the 365 days prior to next vaccination.

                Hope that made it easier to see what i am trying to do.
                I have updated the data example again. The interesting case is id==3

                Thank you for takeing your time.
                Lars

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input float(id date_of_vaccination date_of_infections)
                1     . 13700
                1 13789     .
                1     . 14050
                2     . 19001
                2 19204     .
                2     . 19300
                3     . 19500
                3 20101     .
                3 20700     .
                3     . 20102
                3     . 20805
                4     . 12500
                4     . 12606
                4     . 12900
                4 13002     .
                4     . 13050
                5 20100     .
                5     . 20300
                5     . 20345
                5     . 20399
                5     . 21000
                end
                format %d date_of_vaccination
                format %d date_of_infections

                Comment


                • #9
                  Either I have misunderstood your response, or there is a mistake in your example here. The two vaccination dates for id 3 are 13jan2015 and 03sep2016, and these dates are only 599 days apart.

                  I'm going to assume that the dates for id 3 in the example are an error. So I've modified your example data and written code that will work on the condition that any vaccinations for the same person are at least 731 days apart. The code also verifies that assumption and will break with an error message if that is not the case in your real data.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input float(id date_of_vaccination date_of_infections)
                  1     . 13700
                  1 13789     .
                  1     . 14050
                  2     . 19001
                  2 19204     .
                  2     . 19300
                  3     . 19500
                  3 20101     .
                  3 21065     .
                  3     . 20102
                  3     . 21095
                  4     . 12500
                  4     . 12606
                  4     . 12900
                  4 13002     .
                  4     . 13050
                  5 20100     .
                  5     . 20300
                  5     . 20345
                  5     . 20399
                  5     . 21000
                  end
                  format %d date_of_vaccination
                  format %d date_of_infections
                  
                  // VERIFY ANY GIVEN OBS. IS FOR EITHER A VACCINATION
                  // OR AN INFECTION, BUT NOT BOTH
                  assert missing(date_of_vaccination, date_of_infections)
                  
                  // COMBINE DATES INTO A SINGLE SEQUENCE
                  gen date = cond(missing(date_of_vaccination), date_of_infections, date_of_vaccination)
                  format date %td
                  gen event = cond(missing(date_of_vaccination), "infection", "vaccination")
                  
                  //    TRIM THE DATA AND SAVE A COPY
                  keep id date event
                  tempfile copy
                  save `copy'
                  
                  //    CREATE AN INFECTIONS ONLY DATA SET
                  preserve
                  keep if event == "infection"
                  tempfile infections
                  save `infections'
                  
                  //    AND A VACCINATIONS ONLY SET
                  restore
                  keep if event == "vaccination"
                  
                  //    VERIFY ALL VACCINATIONS ARE AT LEAST 731 DAYS APART
                  by id (date), sort: assert date - date[_n-1] >= 731
                  
                  //    JOIN EACH VACCINATION WITH ALL INFECTION EVENTS THAT OCCUR WITHIN
                  //    365 DAYS BEFORE OR AFTER
                  rangejoin date -365 365 using `infections', by(id)
                  
                  //    MARK EACH INFECTION AS PRE- OR POST- ITS CORRESPONDING VACCINATION
                  label define pre_post    0    "Pre"    1    "Post"
                  gen byte pre_post:pre_post = date_U > date
                  
                  //    COUNT PRE AND POST VACCINATION INFECTIONS FOR EACH PERSON
                  by id, sort: egen post_infections = total(pre_post)
                  by id, sort: egen pre_infections = total(!pre_post)
                  
                  rename date vaccinationdate
                  rename date_U infectiondate
                  drop event*
                  
                  //    RESTORE LAYOUT TO A CHRONOLOGICALLY ORDERED EVENT LIST
                  gen long obs_no = _n
                  reshape long @date, i(obs_no) j(event) string
                  replace pre_post = . if event == "vaccination"
                  drop obs_no
                  duplicates drop
                  isid id date, sort
                  Note: This code requires the -rangejoin- program, written by Robert Picard, and available from SSC.

                  Comment


                  • #10
                    Absolutly brilliant!
                    I will try to translate to my Real data asap.

                    I will try to calculate some IRR.

                    i’ll be back when my abilities runs short.

                    See you soon in other words.

                    Lars

                    Comment

                    Working...
                    X