Days between two dates, separated by year.

Albin Palmqvist

Join Date: Jun 2019

Posts: 1
#1

Days between two dates, separated by year.

26 Jun 2019, 05:48

Hello statalist!

I have a dataset in long format which measure enrollment under a period of time. Even if you are enrolled more than 1 year, they are sometimes appearing under the same row (id 1) or separated in different rows (id 3)

id year start_date end_date days_between
1 2011 01/01/2011 01/03/2013
2 2007 01/06/2007 01/10/2007
3 2003 01/01/2003 31/12/2003
3 2003 01/01/2004 31/12/2004

I've figured out how to convert to date format and calculate days between. My question is how I can separate days between by years? Take id 1 as an example. The individual have start_date in 2011 and end_date in 2013. Calculate days between would give all days between 01/01/2011 and 01/03/2013, but I want days between 01/01/2011 - 31/12/2011, 01/01/2012 - 31/12/2012 and 01/01/2013 - 01/03/2013. Ideally, it would look something like this:

id start_date end_date days_between
1 01/01/2011 31/12/2011
1 01/01/2012 31/12/2012
1 01/01/2013 01/03/2013

Is this possible to achieve?

Best regards
Tags: None

Jean-Claude Arbaut

Join Date: Jul 2017
Posts: 209

26 Jun 2019, 06:23

First convert your dates to Stata numeric (%td) dates. You can use -expand- to duplicate year that span more than one year, then compute new dates.

Sample dataset:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte id str10(start_date end_date)
1 "01/01/2011" "01/03/2013"
1 "01/01/2011" "01/03/2013"
1 "01/01/2011" "01/03/2013"
2 "01/06/2007" "01/10/2007"
3 "01/01/2003" "31/12/2003"
3 "01/01/2004" "31/12/2004"
end

Computations:

Code:

gen d1=date(start_date,"DMY")
gen d2=date(end_date,"DMY")
format %td d1 d2
gen count=year(d2)-year(d1)+1
* group is needed in case there are several rows for a given id that span several years.
* A group is simply made of the expanded rows of a single row in the original dataset,
* even if there were other rows with the same id.
* Note that for the same id, two rows MUST NOT overlap.
bysort id: gen group=_n
expand count
sort id group
by id group: gen d1corr=cond(_n==1,d1,mdy(1,1,year(d1)+_n-1))
by id group: gen d2corr=cond(_N==1 | _n==_N,d2,mdy(12,31,year(d1)+_n-1))
format %td d1corr d2corr
gen days=d2corr-d1corr+1

The final dates are d1corr and d2corr. Note that I leave every variable so you can see what happened in the final dataset. You would probably want the following, but you might also want to keep some of them.

Code:

drop start_date end_date d1 d2 count group
rename (d1corr d2corr) (start_date end_date)

Last edited by Jean-Claude Arbaut; 26 Jun 2019, 06:27.

Announcement

Days between two dates, separated by year.

Comment