I'm attempting to build on a previous question (https://www.statalist.org/forums/for...ly-survey-data), where I'm essentially hoping to figure out executing an intermediary step in that process (which would produce some new variables/information of interest).
In my previous question, the resulting code produces a dataset that calculates the unique total hours for each person-dayofwk grouping. What I've now been working on is generating new variables that contain the actual start and stop times for each person-dayofwk grouping. The main challenge in my initial coding attempts has been that it's possible for people to have non-overlapping hours worked across a given day of the week, which necessitates creating second or third sets (and in the real data, even more than that) of unique start-stop variables because we don't want to include any intervening hours. E.g., if a person was available on a given day from 9-12, and then again from 17-21, two new sets of variables would be created: uniquestart1=9, uniquestop1=12, uniquestart2=17, uniquestop2=21. We would not want to capture the intervening 12-17 hour range.
To give a sense of how this would start, here's some initial code that creates the initial unique start and stop time based on the first day_count for each person-dayofwk grouping.
Some example data:
Here's how I think the final data would look, where essentially the final/highest day_count within each person-dayofwk grouping would contain the final unique start/stop values and you could just keep only those rows.
In my previous question, the resulting code produces a dataset that calculates the unique total hours for each person-dayofwk grouping. What I've now been working on is generating new variables that contain the actual start and stop times for each person-dayofwk grouping. The main challenge in my initial coding attempts has been that it's possible for people to have non-overlapping hours worked across a given day of the week, which necessitates creating second or third sets (and in the real data, even more than that) of unique start-stop variables because we don't want to include any intervening hours. E.g., if a person was available on a given day from 9-12, and then again from 17-21, two new sets of variables would be created: uniquestart1=9, uniquestop1=12, uniquestart2=17, uniquestop2=21. We would not want to capture the intervening 12-17 hour range.
To give a sense of how this would start, here's some initial code that creates the initial unique start and stop time based on the first day_count for each person-dayofwk grouping.
Code:
sort resp_id dayofwk start stop by resp_id dayofwk: gen uniquestart1 = start[1] by resp_id dayofwk: gen uniquestop1 = stop[1]
Some example data:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str5 resp_id int date byte(dayofwk day_count) double(start stop) "ID200" 21849 0 1 0 3 "ID200" 21842 0 2 0 3 "ID200" 21856 0 3 0 3 "ID200" 21835 0 4 0 3.5 "ID200" 21856 0 5 18 23.99 "ID200" 21835 0 6 19 23.99 "ID200" 21857 1 1 0 3 "ID200" 21836 1 2 0 3 "ID200" 21837 2 1 19 23.99 "ID200" 21838 3 1 0 3 "ID200" 21852 3 2 18 23.99 "ID200" 21845 3 3 18 23.99 "ID200" 21846 4 1 0 3 "ID200" 21853 4 2 0 3 "ID200" 21832 4 3 17 23.99 "ID200" 21839 4 4 19 23.99 "ID200" 21846 4 5 19 23.99 "ID200" 21847 5 1 0 3 "ID200" 21833 5 2 0 3 "ID200" 21840 5 3 0 3 "ID200" 21833 5 4 18 23.99 "ID200" 21840 5 5 18.5 23.99 "ID200" 21854 5 6 19 23.99 "ID200" 21847 5 7 19 23.99 "ID200" 21848 6 1 0 3 "ID200" 21834 6 2 0 3 "ID200" 21841 6 3 0 3 "ID200" 21855 6 4 0 3 "ID200" 21848 6 5 7 23.99 "ID200" 21834 6 6 17 23.99 "ID200" 21855 6 7 18 23.99 "ID200" 21841 6 8 19 23.99 "ID300" 21793 0 1 6 14 "ID300" 21800 0 2 6 14 "ID300" 21807 0 3 9 15 "ID300" 21822 1 1 6 14 "ID300" 21808 1 2 6 14 "ID300" 21815 1 3 6 14 "ID300" 21809 2 1 6 14 "ID300" 21796 3 1 5 14 "ID300" 21810 3 2 6 14 "ID300" 21803 3 3 6 16 "ID300" 21817 3 4 6 16 "ID300" 21797 4 1 7 17 "ID300" 21804 4 2 9 17 "ID300" 21818 4 3 9 17 "ID300" 21811 4 4 9 21 "ID300" 21798 5 1 6 14 "ID300" 21819 5 2 6 14 "ID300" 21805 5 3 9 17 "ID300" 21799 6 1 6 12 "ID300" 21820 6 2 14 16 "ID300" 21806 6 3 20 23.99 end format %tddd-Mon-YY date
Here's how I think the final data would look, where essentially the final/highest day_count within each person-dayofwk grouping would contain the final unique start/stop values and you could just keep only those rows.
resp_id | date | dayofwk | day_count | start | stop | uniquestart1 | uniquestop1 | uniquestart2 | uniquestop2 | uniquestart3 | uniquestop3 |
ID200 | 27-Oct-19 | 0 | 1 | 0 | 3 | 0 | 3 | ||||
ID200 | 20-Oct-19 | 0 | 2 | 0 | 3 | 0 | 3 | ||||
ID200 | 3-Nov-19 | 0 | 3 | 0 | 3 | 0 | 3 | ||||
ID200 | 13-Oct-19 | 0 | 4 | 0 | 3.5 | 0 | 3.5 | ||||
ID200 | 3-Nov-19 | 0 | 5 | 18 | 23.99 | 0 | 3.5 | 18 | 23.99 | ||
ID200 | 13-Oct-19 | 0 | 6 | 19 | 23.99 | 0 | 3 | 18 | 23.99 | ||
ID200 | 4-Nov-19 | 1 | 1 | 0 | 3 | 0 | 3 | ||||
ID200 | 14-Oct-19 | 1 | 2 | 0 | 3 | 0 | 3 | ||||
ID200 | 15-Oct-19 | 2 | 1 | 19 | 23.99 | 19 | 23.99 | ||||
ID200 | 16-Oct-19 | 3 | 1 | 0 | 3 | 0 | 3 | ||||
ID200 | 30-Oct-19 | 3 | 2 | 18 | 23.99 | 0 | 3 | 18 | 23.99 | ||
ID200 | 23-Oct-19 | 3 | 3 | 18 | 23.99 | 0 | 3 | 18 | 23.99 | ||
ID200 | 24-Oct-19 | 4 | 1 | 0 | 3 | 0 | 3 | ||||
ID200 | 31-Oct-19 | 4 | 2 | 0 | 3 | 0 | 3 | ||||
ID200 | 10-Oct-19 | 4 | 3 | 17 | 23.99 | 0 | 3 | 17 | 23.99 | ||
ID200 | 17-Oct-19 | 4 | 4 | 19 | 23.99 | 0 | 3 | 17 | 23.99 | ||
ID200 | 24-Oct-19 | 4 | 5 | 19 | 23.99 | 0 | 3 | 17 | 23.99 | ||
ID200 | 25-Oct-19 | 5 | 1 | 0 | 3 | 0 | 3 | ||||
ID200 | 11-Oct-19 | 5 | 2 | 0 | 3 | 0 | 3 | ||||
ID200 | 18-Oct-19 | 5 | 3 | 0 | 3 | 0 | 3 | ||||
ID200 | 11-Oct-19 | 5 | 4 | 18 | 23.99 | 0 | 3 | 18 | 23.99 | ||
ID200 | 18-Oct-19 | 5 | 5 | 18.5 | 23.99 | 0 | 3 | 18 | 23.99 | ||
ID200 | 1-Nov-19 | 5 | 6 | 19 | 23.99 | 0 | 3 | 18 | 23.99 | ||
ID200 | 25-Oct-19 | 5 | 7 | 19 | 23.99 | 0 | 3 | 18 | 23.99 | ||
ID200 | 26-Oct-19 | 6 | 1 | 0 | 3 | 0 | 3 | ||||
ID200 | 12-Oct-19 | 6 | 2 | 0 | 3 | 0 | 3 | ||||
ID200 | 19-Oct-19 | 6 | 3 | 0 | 3 | 0 | 3 | ||||
ID200 | 2-Nov-19 | 6 | 4 | 0 | 3 | 0 | 3 | ||||
ID200 | 26-Oct-19 | 6 | 5 | 7 | 23.99 | 0 | 3 | 7 | 23.99 | ||
ID200 | 12-Oct-19 | 6 | 6 | 17 | 23.99 | 0 | 3 | 7 | 23.99 | ||
ID200 | 2-Nov-19 | 6 | 7 | 18 | 23.99 | 0 | 3 | 7 | 23.99 | ||
ID200 | 19-Oct-19 | 6 | 8 | 19 | 23.99 | 0 | 3 | 7 | 23.99 | ||
ID300 | 1-Sep-19 | 0 | 1 | 6 | 14 | 6 | 14 | ||||
ID300 | 8-Sep-19 | 0 | 2 | 6 | 14 | 6 | 14 | ||||
ID300 | 15-Sep-19 | 0 | 3 | 9 | 15 | 6 | 15 | ||||
ID300 | 30-Sep-19 | 1 | 1 | 6 | 14 | 6 | 14 | ||||
ID300 | 16-Sep-19 | 1 | 2 | 6 | 14 | 6 | 14 | ||||
ID300 | 23-Sep-19 | 1 | 3 | 6 | 14 | 6 | 14 | ||||
ID300 | 17-Sep-19 | 2 | 1 | 6 | 14 | 6 | 14 | ||||
ID300 | 4-Sep-19 | 3 | 1 | 5 | 14 | 5 | 14 | ||||
ID300 | 18-Sep-19 | 3 | 2 | 6 | 14 | 5 | 14 | ||||
ID300 | 11-Sep-19 | 3 | 3 | 6 | 16 | 5 | 16 | ||||
ID300 | 25-Sep-19 | 3 | 4 | 6 | 16 | 5 | 16 | ||||
ID300 | 5-Sep-19 | 4 | 1 | 7 | 17 | 7 | 17 | ||||
ID300 | 12-Sep-19 | 4 | 2 | 9 | 17 | 7 | 17 | ||||
ID300 | 26-Sep-19 | 4 | 3 | 9 | 17 | 7 | 17 | ||||
ID300 | 19-Sep-19 | 4 | 4 | 9 | 21 | 7 | 21 | ||||
ID300 | 6-Sep-19 | 5 | 1 | 6 | 14 | 6 | 14 | ||||
ID300 | 27-Sep-19 | 5 | 2 | 6 | 14 | 6 | 14 | ||||
ID300 | 13-Sep-19 | 5 | 3 | 9 | 17 | 6 | 17 | ||||
ID300 | 7-Sep-19 | 6 | 1 | 6 | 12 | 6 | 12 | ||||
ID300 | 28-Sep-19 | 6 | 2 | 14 | 16 | 6 | 12 | 14 | 16 | ||
ID300 | 14-Sep-19 | 6 | 3 | 20 | 23.99 | 6 | 12 | 14 | 16 | 20 | 23.99 |
Comment