Creating a new variable for overlapping time intervals

Melih Artar

Join Date: Feb 2017

Posts: 13
#1

Creating a new variable for overlapping time intervals

25 Mar 2017, 04:24

Hi Statalist,

I have been lost in coding since I started to deal with this issue and here is my problem. In the example data below, I give you the IDs of individuals and the code of action they take during the day(irrelevant ones are removed). Action's being equal to 111 means that that person is at work. Time is basically the amount of time devoted to that activity. Start and End in fact represents the hours on which a particular activity is initiated and finished, yet they are stated in terms of minutes; for example, 240 means 4:00 AM.

Here is what I would like to do. I want to calculate the how much of work time of these two individuals overlap. For this particular example, the answer is 230. Therefore, I want a create a column where there are 230s as long as FamilyID, which shows that those individuals belong to the same family, equals 1.

(The order of the variables: FamilyID, ID, Action,Time, Start, End)

Code:

FamilyID ID Action Time Start End 1 159 . 180 240 420 1 159 . 10 420 430 1 159 . 20 430 450 1 159 . 30 450 480 1 159 . 60 480 540 1 159 111 210 540 750 1 159 . 30 750 780 1 159 111 180 780 960 1 159 . 60 960 1020 1 159 111 180 1020 1200 1 159 . 60 1200 1260 1 159 . 30 1260 1290 1 159 . 20 1290 1310 1 159 . 10 1310 1320 1 159 . 60 1320 1380 1 159 . 60 1380 1440 1 160 . 180 240 420 1 160 . 120 420 540 1 160 . 10 540 550 1 160 . 10 550 560 1 160 . 40 560 600 1 160 . 60 600 660 1 160 . 20 660 680 1 160 . 100 680 780 1 160 . 120 780 900 1 160 111 60 900 960 1 160 . 30 960 990 1 160 111 140 990 1130 1 160 . 10 1130 1140 1 160 111 60 1140 1200 1 160 . 60 1200 1260 1 160 . 30 1260 1290 1 160 . 10 1290 1300 1 160 . 20 1300 1320 1 160 . 60 1320 1380 1 160 . 20 1380 1400 1 160 . 40 1400 1440
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 29967

25 Mar 2017, 09:37

Well, the following code, I believe works. But it produces different results from what you ask, but I believe your count is incorrect. For example, minutes 900 through 960 are minutes of overlap. I believe you are counting them as 60 minutes, but in fact they are 61 minutes. There are three such intervals in your example, so my count comes up 3 minutes larger than yours. In fact, your Start and End values double count all of the boundary minutes. You might want to first revise your data so that, for example, 240 to 420 is replaced by 240 to 419, etc. If you do that and apply my code, your count and mine will agree.

Code:

clear
input FamilyID ID Action Time Start End
1 159 . 180 240 420
1 159 . 10 420 430
1 159 . 20 430 450
1 159 . 30 450 480
1 159 . 60 480 540
1 159 111 210 540 750
1 159 . 30 750 780
1 159 111 180 780 960
1 159 . 60 960 1020
1 159 111 180 1020 1200
1 159 . 60 1200 1260
1 159 . 30 1260 1290
1 159 . 20 1290 1310
1 159 . 10 1310 1320
1 159 . 60 1320 1380
1 159 . 60 1380 1440
1 160 . 180 240 420
1 160 . 120 420 540
1 160 . 10 540 550
1 160 . 10 550 560
1 160 . 40 560 600
1 160 . 60 600 660
1 160 . 20 660 680
1 160 . 100 680 780
1 160 . 120 780 900
1 160 111 60 900 960
1 160 . 30 960 990
1 160 111 140 990 1130
1 160 . 10 1130 1140
1 160 111 60 1140 1200
1 160 . 60 1200 1260
1 160 . 30 1260 1290
1 160 . 10 1290 1300
1 160 . 20 1300 1320
1 160 . 60 1320 1380
1 160 . 20 1380 1400
1 160 . 40 1400 1440
end

// SAVE A COPY OF THE ORIGINAL DATA
tempfile copy
save `copy'

// ELIMINATE NON WORK OBSERVATIONS
keep if Action == 111

// VERIFY ONLY TWO PERSONS PER FAMILY
by FamilyID (ID), sort: assert inlist(ID, ID[1], ID[_N])

// VERIFY NO OVERLAP OF INTERVALS WITHIN PERSON
by FamilyID ID (Start), sort: assert Start >= End[_n-1] if _n > 1

// EXPAND EACH OBSERVATION TO ITS NUMBER OF MINUTES
expand End - Start + 1
by FamilyID ID Start, sort: gen minute = Start + _n - 1

// IDENTIFY MINUTES THAT ARE OVERLAPPED
by FamilyID minute, sort: gen overlap = _N - 1

// AND ADD THEM UP
by FamilyID: replace overlap = sum(overlap)
by FamilyID: keep if _n == _N
// REVERSE DOUBLE-COUNTING
replace overlap = overlap/2

// MERGE BACK TO ORIGINAL DATA
merge 1:m FamilyID using `copy', keep(match using) nogenerate
drop minute

Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

25 Mar 2017, 10:44

Here's another approach based on a technique described in

SJ-13-1 dm0068 . . . . . Stata tip 114: Expand paired dates to pairs of dates
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q1/13 SJ 13(1):217--219 (no commands)
tip on using expand to deal with paired dates

http://www.stata-journal.com/article...article=dm0068

that does not require expansion by the minute. The following example includes only work observations, a date variable just in case, and a unique identifier for each action. I also added a couple of observation of a third individual.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long actionid float(date FamilyID ID Start End)
1 20903 1 159  540  750
2 20903 1 159  780  960
3 20903 1 159 1020 1200
4 20903 1 160  900  960
5 20903 1 160  990 1130
6 20903 1 160 1140 1200
7 20903 1 170  749  751
8 20903 1 170 1150 1160
end
format %td date

* convert time interval to long form and create action toggle switch
expand 2
bysort actionid: gen atime = cond(_n==1, Start, End)
by actionid: gen onoff = cond(_n==1, 1, -1)

* order by time within a family on a given day;
* put on events before off events;
* use a running sum to track events through time
* note the start of the next time period
gsort FamilyID date atime -onoff actionid
gen runsum = sum(onoff)
by FamilyID date: gen nextp = atime[_n+1]

* the status at the end of the period reflects all on/off events
by FamilyID date atime: gen status = runsum[_N]

* identify spells of overlap
gen overlap = status > 1
gen spell = sum(overlap != overlap[_n-1])

* for each spell with an overlap, calculate length of spell for one observation
bysort FamilyID date spell (atime actionid): gen spell_overlap = nextp[_N] - atime[1] + 1 if overlap & _n == 1

* add overlaps within a group of FamilyID date
by FamilyID date: egen wanted = total(spell_overlap)

* reduce to one observation per FamilyID date
by FamilyID date: keep if _n == 1
keep FamilyID date wanted

Comment

Melih Artar

Join Date: Feb 2017

Posts: 13
#4

25 Mar 2017, 12:49

Firstly sorry for the late response and thank you so much, Clyde and Robert. You guys are amazing. Clyde's method has worked well for me, yet I still tried Robert's too. However, I could not manage to apply it; I think the additional variables he used confused me a little. Thanks a lot.
Comment

Announcement