I am using Stata 15.1 with Mac OS Monterey 12.6
My goal is to produce a heat map with one axis "date" (from 1Oct2022 to 31Oct2022), the second axis "hour" (ranging from 0 to 23), and the intensity of each grid square based on the number of events that occurred on that date and hour.
My dataset has nearly 600,000 observations. Variables include id, startdate (the date in October 2022 on which the event occurred), and starthour (the hour the event started, using a 24 hour clock). Here is a random sample of 25 observations:
In order to produce the heat map, I need to generate a variable named events: the number of events that occurred on that date and at an hour. I was able to visualize the concept using multiple histograms:
The X axis is the start hour (0-23) and the Y axis the frequency of events. As you can see, the distribution of events varies from day to day. I could stop here, but I think a heat map makes for a more interesting visualization.
I next generated individual tables that show the number of events that occurred at a specified hour for every day in October.
An excerpt for starthour 0:
----------------------------------------------------------------------------------------------------------------
-> startdate = 01oct2022
variable | N
-------------+----------
starthour | 540
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 02oct2022
variable | N
-------------+----------
starthour | 626
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 03oct2022
variable | N
-------------+----------
starthour | 167
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 04oct2022
variable | N
-------------+----------
starthour | 117
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 05oct2022
variable | N
-------------+----------
starthour | 157
------------------------
I could repeat this 24 times up to:
then add these observations to the variable event "by hand". But this will be tedious as there are 24*31=744 observations.
I then tried a forval loop based on @nickcox: Stata tip 51: Events in intervals
https://journals.sagepub.com/doi/pdf...867X0700700312
This is the result:
quietly forvalues i = 1/`N' {
invalid syntax
r(198);
At this point, I am stumped.
Help/advice will be much appreciated.
Rich
My goal is to produce a heat map with one axis "date" (from 1Oct2022 to 31Oct2022), the second axis "hour" (ranging from 0 to 23), and the intensity of each grid square based on the number of events that occurred on that date and hour.
My dataset has nearly 600,000 observations. Variables include id, startdate (the date in October 2022 on which the event occurred), and starthour (the hour the event started, using a 24 hour clock). Here is a random sample of 25 observations:
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input float(id startdate starthour) 5374 22921 11 12099 22923 16 11817 22925 19 4131 22926 11 5277 22927 10 7038 22927 10 13864 22928 17 1912 22929 7 2539 22929 8 4620 22929 9 10553 22930 18 1228 22933 7 11889 22936 20 9889 22937 17 17155 22939 18 11620 22940 14 20202 22940 17 14980 22941 16 19812 22941 18 4989 22945 11 1925 22946 8 9086 22947 14 14252 22948 22 2298 22949 8 2432 22949 9 end format %td startdate
Code:
histogram starthour, discrete frequency by(startdate)
I next generated individual tables that show the number of events that occurred at a specified hour for every day in October.
Code:
by startdate, sort : tabstat starthour if starthour==0, statistics( count )
----------------------------------------------------------------------------------------------------------------
-> startdate = 01oct2022
variable | N
-------------+----------
starthour | 540
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 02oct2022
variable | N
-------------+----------
starthour | 626
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 03oct2022
variable | N
-------------+----------
starthour | 167
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 04oct2022
variable | N
-------------+----------
starthour | 117
------------------------
----------------------------------------------------------------------------------------------------------------
-> startdate = 05oct2022
variable | N
-------------+----------
starthour | 157
------------------------
I could repeat this 24 times up to:
Code:
by startdate, sort : tabstat starthour if starthour==23, statistics( count )
I then tried a forval loop based on @nickcox: Stata tip 51: Events in intervals
https://journals.sagepub.com/doi/pdf...867X0700700312
Code:
gen events = . local N = _N quietly forvalues i = 1/`N' { count if inrange(starthour, 0, 24) & /// id == id[`i'] & /// inrange(startdate[`i'] - startdate, 1, 31) replace events = r(N) in `i' }
quietly forvalues i = 1/`N' {
invalid syntax
r(198);
At this point, I am stumped.
Help/advice will be much appreciated.
Rich
Comment