Calculate all possible unique ranges of value

Garrett Baker

Join Date: Jun 2024
Posts: 26

Calculate all possible unique ranges of value

09 Aug 2024, 06:56

I'm attempting to build on a previous question (https://www.statalist.org/forums/for...ly-survey-data), where I'm essentially hoping to figure out executing an intermediary step in that process (which would produce some new variables/information of interest).

In my previous question, the resulting code produces a dataset that calculates the unique total hours for each person-dayofwk grouping. What I've now been working on is generating new variables that contain the actual start and stop times for each person-dayofwk grouping. The main challenge in my initial coding attempts has been that it's possible for people to have non-overlapping hours worked across a given day of the week, which necessitates creating second or third sets (and in the real data, even more than that) of unique start-stop variables because we don't want to include any intervening hours. E.g., if a person was available on a given day from 9-12, and then again from 17-21, two new sets of variables would be created: uniquestart1=9, uniquestop1=12, uniquestart2=17, uniquestop2=21. We would not want to capture the intervening 12-17 hour range.

To give a sense of how this would start, here's some initial code that creates the initial unique start and stop time based on the first day_count for each person-dayofwk grouping.

Code:

sort resp_id dayofwk start stop
by resp_id dayofwk: gen uniquestart1 = start[1]
by resp_id dayofwk: gen uniquestop1 = stop[1]

Some example data:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str5 resp_id int date byte(dayofwk day_count) double(start stop)
"ID200" 21849 0 1    0     3
"ID200" 21842 0 2    0     3
"ID200" 21856 0 3    0     3
"ID200" 21835 0 4    0   3.5
"ID200" 21856 0 5   18 23.99
"ID200" 21835 0 6   19 23.99
"ID200" 21857 1 1    0     3
"ID200" 21836 1 2    0     3
"ID200" 21837 2 1   19 23.99
"ID200" 21838 3 1    0     3
"ID200" 21852 3 2   18 23.99
"ID200" 21845 3 3   18 23.99
"ID200" 21846 4 1    0     3
"ID200" 21853 4 2    0     3
"ID200" 21832 4 3   17 23.99
"ID200" 21839 4 4   19 23.99
"ID200" 21846 4 5   19 23.99
"ID200" 21847 5 1    0     3
"ID200" 21833 5 2    0     3
"ID200" 21840 5 3    0     3
"ID200" 21833 5 4   18 23.99
"ID200" 21840 5 5 18.5 23.99
"ID200" 21854 5 6   19 23.99
"ID200" 21847 5 7   19 23.99
"ID200" 21848 6 1    0     3
"ID200" 21834 6 2    0     3
"ID200" 21841 6 3    0     3
"ID200" 21855 6 4    0     3
"ID200" 21848 6 5    7 23.99
"ID200" 21834 6 6   17 23.99
"ID200" 21855 6 7   18 23.99
"ID200" 21841 6 8   19 23.99
"ID300" 21793 0 1    6    14
"ID300" 21800 0 2    6    14
"ID300" 21807 0 3    9    15
"ID300" 21822 1 1    6    14
"ID300" 21808 1 2    6    14
"ID300" 21815 1 3    6    14
"ID300" 21809 2 1    6    14
"ID300" 21796 3 1    5    14
"ID300" 21810 3 2    6    14
"ID300" 21803 3 3    6    16
"ID300" 21817 3 4    6    16
"ID300" 21797 4 1    7    17
"ID300" 21804 4 2    9    17
"ID300" 21818 4 3    9    17
"ID300" 21811 4 4    9    21
"ID300" 21798 5 1    6    14
"ID300" 21819 5 2    6    14
"ID300" 21805 5 3    9    17
"ID300" 21799 6 1    6    12
"ID300" 21820 6 2   14    16
"ID300" 21806 6 3   20 23.99
end
format %tddd-Mon-YY date

Here's how I think the final data would look, where essentially the final/highest day_count within each person-dayofwk grouping would contain the final unique start/stop values and you could just keep only those rows.

resp_id	date	dayofwk	day_count	start	stop	uniquestart1	uniquestop1	uniquestart2	uniquestop2	uniquestart3	uniquestop3
ID200	27-Oct-19	0	1	0	3	0	3
ID200	20-Oct-19	0	2	0	3	0	3
ID200	3-Nov-19	0	3	0	3	0	3
ID200	13-Oct-19	0	4	0	3.5	0	3.5
ID200	3-Nov-19	0	5	18	23.99	0	3.5	18	23.99
ID200	13-Oct-19	0	6	19	23.99	0	3	18	23.99
ID200	4-Nov-19	1	1	0	3	0	3
ID200	14-Oct-19	1	2	0	3	0	3
ID200	15-Oct-19	2	1	19	23.99	19	23.99
ID200	16-Oct-19	3	1	0	3	0	3
ID200	30-Oct-19	3	2	18	23.99	0	3	18	23.99
ID200	23-Oct-19	3	3	18	23.99	0	3	18	23.99
ID200	24-Oct-19	4	1	0	3	0	3
ID200	31-Oct-19	4	2	0	3	0	3
ID200	10-Oct-19	4	3	17	23.99	0	3	17	23.99
ID200	17-Oct-19	4	4	19	23.99	0	3	17	23.99
ID200	24-Oct-19	4	5	19	23.99	0	3	17	23.99
ID200	25-Oct-19	5	1	0	3	0	3
ID200	11-Oct-19	5	2	0	3	0	3
ID200	18-Oct-19	5	3	0	3	0	3
ID200	11-Oct-19	5	4	18	23.99	0	3	18	23.99
ID200	18-Oct-19	5	5	18.5	23.99	0	3	18	23.99
ID200	1-Nov-19	5	6	19	23.99	0	3	18	23.99
ID200	25-Oct-19	5	7	19	23.99	0	3	18	23.99
ID200	26-Oct-19	6	1	0	3	0	3
ID200	12-Oct-19	6	2	0	3	0	3
ID200	19-Oct-19	6	3	0	3	0	3
ID200	2-Nov-19	6	4	0	3	0	3
ID200	26-Oct-19	6	5	7	23.99	0	3	7	23.99
ID200	12-Oct-19	6	6	17	23.99	0	3	7	23.99
ID200	2-Nov-19	6	7	18	23.99	0	3	7	23.99
ID200	19-Oct-19	6	8	19	23.99	0	3	7	23.99
ID300	1-Sep-19	0	1	6	14	6	14
ID300	8-Sep-19	0	2	6	14	6	14
ID300	15-Sep-19	0	3	9	15	6	15
ID300	30-Sep-19	1	1	6	14	6	14
ID300	16-Sep-19	1	2	6	14	6	14
ID300	23-Sep-19	1	3	6	14	6	14
ID300	17-Sep-19	2	1	6	14	6	14
ID300	4-Sep-19	3	1	5	14	5	14
ID300	18-Sep-19	3	2	6	14	5	14
ID300	11-Sep-19	3	3	6	16	5	16
ID300	25-Sep-19	3	4	6	16	5	16
ID300	5-Sep-19	4	1	7	17	7	17
ID300	12-Sep-19	4	2	9	17	7	17
ID300	26-Sep-19	4	3	9	17	7	17
ID300	19-Sep-19	4	4	9	21	7	21
ID300	6-Sep-19	5	1	6	14	6	14
ID300	27-Sep-19	5	2	6	14	6	14
ID300	13-Sep-19	5	3	9	17	6	17
ID300	7-Sep-19	6	1	6	12	6	12
ID300	28-Sep-19	6	2	14	16	6	12	14	16
ID300	14-Sep-19	6	3	20	23.99	6	12	14	16	20	23.99

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3136
#2

09 Aug 2024, 16:24

So you want the total number of hours available? That's just the sum of all the differences between stop/start times.
Comment
Garrett Baker

Join Date: Jun 2024

Posts: 26
#3

13 Aug 2024, 16:59

George Ford not the sum, that was the interest in the question I linked to that Clyde Schechter helped with.

I'm now hoping to create new variables that capture those actual unique start/stop times (this would be uniquestart1, uniquestop1, uniquestart2, uniquestop2, etc. in the table which I generated by hand as an example).
Comment

Announcement

Calculate all possible unique ranges of value

Comment

Comment