Adjusting times across survey dates

Garrett Baker

Join Date: Jun 2024
Posts: 28

Adjusting times across survey dates

10 Jul 2024, 09:18

Hello everyone,

I have a question building on a previous post of mine (here: https://www.statalist.org/forums/for...ly-survey-data) where I have survey data of job shift start-stop times, measured daily for individual respondents.

One nuanced complication is that (occasionally) respondents taking the survey would indicate their shift for "that day" was really an overnight shift that ended earlier that morning but technically began the day prior. For example, Respondent A would indicate on October 10 that they had a shift that began at 9pm and ended at 2am. (Note that these instances are easily detected by subtracting stop - start, which should have a positive value unless the shift crosses over midnight).

In these cases, we'd ideally want to split out these hours worked by the actual day the work took place and create rows according to the actual date worked. In my hypothetical example, Person A's row for October 10 should be start = 0 and stop = 2 (start and stop times are numeric in 24-hour time format), and a new row would be added for for Person A on October 9 where the start = 21 and stop = 23.99.

Some simplified example data:

Code:

clear
input float(date start stop) long person_id
22200   22  3  1
21866   8  12  1
22189   20  1  1
22208   20.5  3  1
22213   .  .  2
22219   14 0  2
22195   22  4  2
21892 7.5 16  2
22232   8  12  3
21862   8  14  3
21869   23.5  1.25  3
21836   7  12  3
end
format %td date
sort person_id date

And here's roughly how I'd hope the resulting dataset looked:

date	start	stop	person_id
11/13/2019	8	12	1
9/30/2020	20	23.99	1
10/1/2020	0	1	1
10/11/2020	22	22.99	1
10/12/2020	0	3	1
10/19/2020	21	23.99	1
10/20/2020	0	3	1
12/9/2019	7.5	16	2
10/6/2020	22	23.99	2
10/7/2020	0	4	2
10/25/2020			2
10/31/2020	14	0	2
10/14/2019	7	12	3
11/9/2019	8	14	3
11/15/2019	23.5	23.99	3
11/16/2019	0	1.25	3
11/13/2020	8	12	3

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30147
#2

10 Jul 2024, 09:43

Code:

isid person_id date, sort frame put _all if stop < start, into(double_days) frame double_days { expand 2 by person_id date, sort: replace stop = 23.99 if _n == 1 by person_id date: replace start = 0 if _n == 2 by person_id date: replace date = date -1 if _n == 1 } drop if stop < start frameappend double_days, drop sort person_id date start

Note: -frameappend- is by Jeremy Freese, and is available from SSC.

One little wrinkle here. Suppose I work from 22:00 on July 1 through 6:00 on July 2 and also work from 20:00 to 23:30 on July 2. After applying this method, it will no longer be the case that id and date uniquely identify observations in the data: there will now be two observations for me on July 2, corresponding to the two, separated periods I worked. Thank you for posting the link to the earlier thread involving this (or closely related) data. I scanned the code there and I notice that it begins with -isid person_id survey_date-. That line will now break with an error message if you apply that code to the results from the current code. Further examining the rest of the code, I don't think that that verification of unique identification was actually necessary for the old code to work. So I think if you remove that -isid person_id survey_date- from that code, it will run correctly. But I have not attempted to test that out with data. If you run into a problem, post back with an example that reproduces it and I'll try to troubleshoot.
Comment
Garrett Baker

Join Date: Jun 2024

Posts: 28
#3

10 Jul 2024, 11:13

Thanks very much Clyde Schechter , this is exactly what I was looking for. You're correct about that unique observation wrinkle, I'd considered that before but will have to do some further testing with the real data now to see if there are any implications/problems that I haven't anticipated. I'll post back if so. Thanks again!
Comment

Announcement

Adjusting times across survey dates

Comment

Comment