Counting number of events within a specific time range.

Richard Stallman

Join Date: Feb 2015

Posts: 8
#1

Counting number of events within a specific time range.

18 Aug 2015, 08:47

Hi StataList,

I have another question I was hoping you friendly folks can help me with.

I have a dataset that includes variable id, day, and time. Each id represents an individual event that occurred on a specific day at a specific time. Many events occur every hour of each day. Day is coded as a number 1 through 12. Time is coded as a number 0.00 through 23.00.

I want to create a variable that contains the number of events that occurred over the next 1 hour period after the time of the current event.

So for example, id = 1 starts on day = 1 and the first event is at time = 17.25. I would like to create a new variable 1hr which would equal the number of events that occur between 17.25 and 18.25. Then if id = 2 starts on day = 1 and time = 17.30, 1hr would equal the number of events from 17.30 and 18.30.

I hope this is clear. I really appreciate your help.

Best,
Rich
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35589
#2

18 Aug 2015, 09:18

Without regular spacing of times, it's messy and slow.

I'd advise that you set up a proper date-time variable first of all. Dealing with separate day and time variables would be a bad idea.

Then it's a loop over observations: http://www.stata-journal.com/sjpdf.h...iclenum=pr0033
Comment

Robert Picard

Join Date: Mar 2014
Posts: 1536

18 Aug 2015, 09:26

Here's one approach to these types of problems. There's no need to loop over observations, you instead order observations by date and time (within id) and loop over lags/leads until there are no more cases that satisfy the condition. Something like

Code:

* create fake data
clear
input id str20 (day time)
1 18aug2015 "10:00am"
1 18aug2015 "10:01am"
1 18aug2015 "10:11am"
1 18aug2015 "11:02am"
1 18aug2015 "11:22am"
1 18aug2015 "11:59am"
1 18aug2015 "1:30pm"
2 18aug2015 "10:00am"
2 18aug2015 "10:01am"
end

* convert day and time to numeric form
gen double eventtime = clock(day + " " + time, "DMYhm")
format %tc eventtime

* for each id, order events by increasing date and time
sort id eventtime

* loop over increasing forward leads and add to the count if the
* time difference is within 1 hour. Stop when there are no more cases
* within 1 hour for any observation
gen mycount = 0
local more = 1
local i = 0
while `more' {
    local i = `i' + 1
    by id: gen doit = (eventtime[_n + `i'] - eventtime) <= msofhours(1)
    replace mycount = mycount + doit
    count if doit
    local more = r(N)
    drop doit
}

list, sepby(id) noobs

Last edited by Robert Picard; 18 Aug 2015, 09:37.

Comment

Richard Stallman

Join Date: Feb 2015

Posts: 8
#4

18 Aug 2015, 12:24

Thanks once again for your help Nick.
I have little experience using dates in stata but took your advice to use a properly formatted date instead of the clunky two variable system I had set up.
The article you provided was also very helpful.
My solution ended up being:

gen n60 = .
local N = _N
forval i = 1/`N'{
count if inrange(date_obs[`i']-date_obs,mdyhms(1,1,1960,0,0,0),mdyhms(1,1,1960,1, 0,0))
replace n60 = r(N) in `i'
}
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35589
#5

18 Aug 2015, 12:30

Robert Picard has what looks likely to be a faster approach in general.

In respect of your code: you are evaluating two constants again and again for every observation, so this would be faster in future:

Code:

gen n60 = . local N = _N quietly forval i = 1/`N' { count if inrange(date_obs[`i']-date_obs, 0, 36e5) replace n60 = r(N) in `i' }

Your code pools events regardless of identifier. I take it that is what you want.
Comment

Announcement

Counting number of events within a specific time range.

Comment

Comment

Comment

Comment