Hello everyone,
I have a question building on a previous post of mine (here: https://www.statalist.org/forums/for...ly-survey-data) where I have survey data of job shift start-stop times, measured daily for individual respondents.
One nuanced complication is that (occasionally) respondents taking the survey would indicate their shift for "that day" was really an overnight shift that ended earlier that morning but technically began the day prior. For example, Respondent A would indicate on October 10 that they had a shift that began at 9pm and ended at 2am. (Note that these instances are easily detected by subtracting stop - start, which should have a positive value unless the shift crosses over midnight).
In these cases, we'd ideally want to split out these hours worked by the actual day the work took place and create rows according to the actual date worked. In my hypothetical example, Person A's row for October 10 should be start = 0 and stop = 2 (start and stop times are numeric in 24-hour time format), and a new row would be added for for Person A on October 9 where the start = 21 and stop = 23.99.
Some simplified example data:
And here's roughly how I'd hope the resulting dataset looked:
I have a question building on a previous post of mine (here: https://www.statalist.org/forums/for...ly-survey-data) where I have survey data of job shift start-stop times, measured daily for individual respondents.
One nuanced complication is that (occasionally) respondents taking the survey would indicate their shift for "that day" was really an overnight shift that ended earlier that morning but technically began the day prior. For example, Respondent A would indicate on October 10 that they had a shift that began at 9pm and ended at 2am. (Note that these instances are easily detected by subtracting stop - start, which should have a positive value unless the shift crosses over midnight).
In these cases, we'd ideally want to split out these hours worked by the actual day the work took place and create rows according to the actual date worked. In my hypothetical example, Person A's row for October 10 should be start = 0 and stop = 2 (start and stop times are numeric in 24-hour time format), and a new row would be added for for Person A on October 9 where the start = 21 and stop = 23.99.
Some simplified example data:
Code:
clear input float(date start stop) long person_id 22200 22 3 1 21866 8 12 1 22189 20 1 1 22208 20.5 3 1 22213 . . 2 22219 14 0 2 22195 22 4 2 21892 7.5 16 2 22232 8 12 3 21862 8 14 3 21869 23.5 1.25 3 21836 7 12 3 end format %td date sort person_id date
And here's roughly how I'd hope the resulting dataset looked:
date | start | stop | person_id |
11/13/2019 | 8 | 12 | 1 |
9/30/2020 | 20 | 23.99 | 1 |
10/1/2020 | 0 | 1 | 1 |
10/11/2020 | 22 | 22.99 | 1 |
10/12/2020 | 0 | 3 | 1 |
10/19/2020 | 21 | 23.99 | 1 |
10/20/2020 | 0 | 3 | 1 |
12/9/2019 | 7.5 | 16 | 2 |
10/6/2020 | 22 | 23.99 | 2 |
10/7/2020 | 0 | 4 | 2 |
10/25/2020 | 2 | ||
10/31/2020 | 14 | 0 | 2 |
10/14/2019 | 7 | 12 | 3 |
11/9/2019 | 8 | 14 | 3 |
11/15/2019 | 23.5 | 23.99 | 3 |
11/16/2019 | 0 | 1.25 | 3 |
11/13/2020 | 8 | 12 | 3 |
Comment