Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Days between two dates, separated by year.

    Hello statalist!

    I have a dataset in long format which measure enrollment under a period of time. Even if you are enrolled more than 1 year, they are sometimes appearing under the same row (id 1) or separated in different rows (id 3)


    id year start_date end_date days_between
    1 2011 01/01/2011 01/03/2013
    2 2007 01/06/2007 01/10/2007
    3 2003 01/01/2003 31/12/2003
    3 2003 01/01/2004 31/12/2004

    I've figured out how to convert to date format and calculate days between. My question is how I can separate days between by years? Take id 1 as an example. The individual have start_date in 2011 and end_date in 2013. Calculate days between would give all days between 01/01/2011 and 01/03/2013, but I want days between 01/01/2011 - 31/12/2011, 01/01/2012 - 31/12/2012 and 01/01/2013 - 01/03/2013. Ideally, it would look something like this:

    id start_date end_date days_between
    1 01/01/2011 31/12/2011
    1 01/01/2012 31/12/2012
    1 01/01/2013 01/03/2013


    Is this possible to achieve?

    Best regards


  • #2
    First convert your dates to Stata numeric (%td) dates. You can use -expand- to duplicate year that span more than one year, then compute new dates.

    Sample dataset:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte id str10(start_date end_date)
    1 "01/01/2011" "01/03/2013"
    1 "01/01/2011" "01/03/2013"
    1 "01/01/2011" "01/03/2013"
    2 "01/06/2007" "01/10/2007"
    3 "01/01/2003" "31/12/2003"
    3 "01/01/2004" "31/12/2004"
    end
    Computations:
    Code:
    gen d1=date(start_date,"DMY")
    gen d2=date(end_date,"DMY")
    format %td d1 d2
    gen count=year(d2)-year(d1)+1
    * group is needed in case there are several rows for a given id that span several years.
    * A group is simply made of the expanded rows of a single row in the original dataset,
    * even if there were other rows with the same id.
    * Note that for the same id, two rows MUST NOT overlap.
    bysort id: gen group=_n
    expand count
    sort id group
    by id group: gen d1corr=cond(_n==1,d1,mdy(1,1,year(d1)+_n-1))
    by id group: gen d2corr=cond(_N==1 | _n==_N,d2,mdy(12,31,year(d1)+_n-1))
    format %td d1corr d2corr
    gen days=d2corr-d1corr+1

    The final dates are d1corr and d2corr. Note that I leave every variable so you can see what happened in the final dataset. You would probably want the following, but you might also want to keep some of them.

    Code:
    drop start_date end_date d1 d2 count group
    rename (d1corr d2corr) (start_date end_date)
    Last edited by Jean-Claude Arbaut; 26 Jun 2019, 06:27.

    Comment

    Working...
    X