Checking if Date is Between Two Dates

michael joe

Join Date: May 2015

Posts: 50
#1

Checking if Date is Between Two Dates

17 Jul 2015, 06:55

I have two large datasets, df1 and df2. The first dataset, df1, contains the columns 'ID' and 'actualdata'.

The second dataset, df2, contains two date fields, 'beforedate' and 'afterdate', which represent a start and end date, respectively.

My goal is to create a new column in df1 named 'match' that indicates whether the 'actualdate' for each row in df1 is between the 'beforedate’ and ‘afterdate’ of any of the observations of the same 'ID' in df2. If it is between, I want to give the 'match' column a value of 1, otherwise 0. I want to include observations of both df1 and df2 where there is no 'ID' match).
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

17 Jul 2015, 12:09

help joinby - and then drop any lines not meeting your condition (or keep only those that do meet your condition)
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#3

17 Jul 2015, 12:16

Can the intervals overlap?
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#4

17 Jul 2015, 12:27

One or two merges can do it:
20150717_1302806.do
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

17 Jul 2015, 14:38

For the benefit of Statalist users in case Sergiy's website becomes unavailable in the future:

Code:

// Sergiy Radyakin 2015
// Statalist request: http://www.statalist.org/forums/forum/general-stata-discussion/general/1302806-checking-if-date-is-between-two-dates 
clear all
input id str9 acts
101 "20mar1999"
102 "17may2002"
103 "19nov2009"
104 "07feb2012"
end
generate act=date(acts,"DMY")
format act %td
list
drop acts

sort id
tempfile data1
save "`data1'"

clear
input id str9 df1s str9 df2s
101 "10jan1999" "03feb1999"
102 "07may2002" "07may2015"
103 "19dec2000" "31dec2009"
104 "07feb2012" "07feb2012"
end

generate df1=date(df1s,"DMY")
generate df2=date(df2s,"DMY")
format df1 %td
format df2 %td
drop df1s df2s

// all of the above constitutes data preparation and should have been provided by the topic starter

// assuming intervals are non-overlapping episodes:
sort id
merge id using "`data1'"
assert _merge!=2
drop _merge

generate match=inrange(act,df1,df2)
keep if match
sort id
merge id using "`data1'"
assert _merge!=1
drop _merge
replace match=!missing(match)

list

Comment

michael joe

Join Date: May 2015

Posts: 50
#6

17 Jul 2015, 14:46

Sergiy Radyakin The start and end invervals can overlap sometimes for each ID. I was trying to get something like Clyde's solution in http://www.statalist.org/forums/foru...d-a-date-range except for looking for an actual date (as opposed to just looking for the year) within two dates (inclusive) . I'd like to keep both observations from using and master that do not match. Basically, if there is any instance where the ID has a date between the start and end date, I wish this to be coded as a 1, 0 otherwise.

Last edited by michael joe; 17 Jul 2015, 15:07.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#7

17 Jul 2015, 20:45

I am having problems understanding the compatibility of these two statements:

Originally posted by michael joe View Post

Sergiy Radyakin I'd like to keep both observations from using and master that do not match.

The above implies that there is only one observation in master and one observation in using and they may or may not match. Furthermore you want to trim the data to leave fewer observations in each of the data files.

Originally posted by michael joe View Post

Sergiy Radyakin Basically, if there is any instance where the ID has a date between the start and end date, I wish this to be coded as a 1, 0 otherwise.

I understand this as there is a single file where different observations will be classified with 1s and 0s and all original observations retained.

Given this confusion, the program I posted is compatible with the following scenario:

File 1: contains combinations item-code, date -- a certain item was purchased on a certain date in a supermarket.
File 2: contains the spells of certain product promotions: item-code, begin of promotion, end of promotion.
The program will identify whether an item was purchased during a promo time, or during a regular time.
Small font: There was no more than one purchase of each type of item by this customer (ever) and there is only two states of the world Promotion or NoPromotion of the certain item.

michael joe , perhaps you could work out a small example. Post two small datasets (inputs) and the result what you want to be produced.

Best, Sergiy Radyakin
Comment
Priyanka Pandhi

Join Date: Jun 2018

Posts: 25
#8

03 Oct 2019, 10:08

Steps involving are
1. Format the dates with respect to stata dates
2. then use syntax
gen DateWithinRange = inrange(date3, date1, date2)
date3: date you wanted to check
date1-2 , the ranges of the dates
Comment

Announcement

Checking if Date is Between Two Dates

Comment

Comment

Comment

Comment

Comment

Comment

Comment