Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking if Date is Between Two Dates

    I have two large datasets, df1 and df2. The first dataset, df1, contains the columns 'ID' and 'actualdata'.


    The second dataset, df2, contains two date fields, 'beforedate' and 'afterdate', which represent a start and end date, respectively.


    My goal is to create a new column in df1 named 'match' that indicates whether the 'actualdate' for each row in df1 is between the 'beforedate’ and ‘afterdate’ of any of the observations of the same 'ID' in df2. If it is between, I want to give the 'match' column a value of 1, otherwise 0. I want to include observations of both df1 and df2 where there is no 'ID' match).

  • #2
    help joinby - and then drop any lines not meeting your condition (or keep only those that do meet your condition)

    Comment


    • #3
      Can the intervals overlap?

      Comment


      • #4
        One or two merges can do it:
        20150717_1302806.do

        Comment


        • #5
          For the benefit of Statalist users in case Sergiy's website becomes unavailable in the future:

          Code:
          // Sergiy Radyakin 2015
          // Statalist request: http://www.statalist.org/forums/forum/general-stata-discussion/general/1302806-checking-if-date-is-between-two-dates 
          clear all
          input id str9 acts
          101 "20mar1999"
          102 "17may2002"
          103 "19nov2009"
          104 "07feb2012"
          end
          generate act=date(acts,"DMY")
          format act %td
          list
          drop acts
          
          sort id
          tempfile data1
          save "`data1'"
          
          clear
          input id str9 df1s str9 df2s
          101 "10jan1999" "03feb1999"
          102 "07may2002" "07may2015"
          103 "19dec2000" "31dec2009"
          104 "07feb2012" "07feb2012"
          end
          
          generate df1=date(df1s,"DMY")
          generate df2=date(df2s,"DMY")
          format df1 %td
          format df2 %td
          drop df1s df2s
          
          // all of the above constitutes data preparation and should have been provided by the topic starter
          
          // assuming intervals are non-overlapping episodes:
          sort id
          merge id using "`data1'"
          assert _merge!=2
          drop _merge
          
          generate match=inrange(act,df1,df2)
          keep if match
          sort id
          merge id using "`data1'"
          assert _merge!=1
          drop _merge
          replace match=!missing(match)
          
          list

          Comment


          • #6
            Sergiy Radyakin The start and end invervals can overlap sometimes for each ID. I was trying to get something like Clyde's solution in http://www.statalist.org/forums/foru...d-a-date-range except for looking for an actual date (as opposed to just looking for the year) within two dates (inclusive) . I'd like to keep both observations from using and master that do not match. Basically, if there is any instance where the ID has a date between the start and end date, I wish this to be coded as a 1, 0 otherwise.
            Last edited by michael joe; 17 Jul 2015, 15:07.

            Comment


            • #7
              I am having problems understanding the compatibility of these two statements:
              Originally posted by michael joe View Post
              Sergiy Radyakin I'd like to keep both observations from using and master that do not match.
              The above implies that there is only one observation in master and one observation in using and they may or may not match. Furthermore you want to trim the data to leave fewer observations in each of the data files.

              Originally posted by michael joe View Post
              Sergiy Radyakin Basically, if there is any instance where the ID has a date between the start and end date, I wish this to be coded as a 1, 0 otherwise.
              I understand this as there is a single file where different observations will be classified with 1s and 0s and all original observations retained.

              Given this confusion, the program I posted is compatible with the following scenario:

              File 1: contains combinations item-code, date -- a certain item was purchased on a certain date in a supermarket.
              File 2: contains the spells of certain product promotions: item-code, begin of promotion, end of promotion.
              The program will identify whether an item was purchased during a promo time, or during a regular time.
              Small font: There was no more than one purchase of each type of item by this customer (ever) and there is only two states of the world Promotion or NoPromotion of the certain item.



              michael joe , perhaps you could work out a small example. Post two small datasets (inputs) and the result what you want to be produced.

              Best, Sergiy Radyakin

              Comment


              • #8
                Steps involving are
                1. Format the dates with respect to stata dates
                2. then use syntax
                gen DateWithinRange = inrange(date3, date1, date2)
                date3: date you wanted to check
                date1-2 , the ranges of the dates

                Comment

                Working...
                X