Generate a unified time based on separate variables for date and time

Asaf Yancu

Join Date: May 2021

Posts: 38
#1

Generate a unified time based on separate variables for date and time

16 Oct 2023, 12:24

Hi all statalists!
In my data, I have the dates of exams (test_date) and the time they started (start_hour).

test_date is in DMY format. While start_hour is in a weird format (integer), for example, start_hour == 900 says that the exam started at 9:00 a.m. or 09:00 in 24-hour format, and start_hour == 1430 indicates that the exam started at 2:30 p.m. or 14:30 in 24-hour format.

I attach a sample of my data below:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(test_date start_hour) 16560 900 16579 1415 16589 1445 16589 1700 16602 1100 16615 900 16618 900 15465 1430 15467 900 15467 1100 end format %td test_date

I would like to generate two new variables:
time_hour, which takes the values of start_hour and converts them into 24-hour format. For example, when start_hour == 1430, time_hour == 14:30.

date_time, which consists of the date and the time of the exam. For instance, for the first row (test_date == 04may2005 and start_hour == 900), I want to see something like 04 May 2005, 09:00. I need date_time to be in a format that enables me to calculate the difference between two dates. For example, the difference in days between the first observation and the second observation is 19 and in minutes is 27,675.

Many thanks in advance!
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30163

16 Oct 2023, 12:35

Code:

tostring start_hour, gen(time_hour) format(%04.0f)
replace time_hour = substr(time_hour, 1, 2) + ":" + substr(time_hour, 3, 4)

gen int h = floor(start_hour/100)
gen int m = mod(start_hour, 100)
gen double date_time = dhms(test_date, h, m, 0)
format date_time %tc

Comment

Asaf Yancu

Join Date: May 2021

Posts: 38
#3

16 Oct 2023, 13:24

Thanks, Mr. Schechter!

But what if I want to include time_hour in a regression? For instance, I want to estimate the effect of the time the exam starts on the score

Code:

reg score time_hour
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30163
#4

16 Oct 2023, 13:39

You can't use that variable for that purpose. You have to use a proper Stata internal format clock variable for that. We can get that by extracting the date from date_time.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(test_date start_hour) 16560 900 16579 1415 16589 1445 16589 1700 16602 1100 16615 900 16618 900 15465 1430 15467 900 15467 1100 end format %td test_date gen int h = floor(start_hour/100) gen int m = mod(start_hour, 100) gen double date_time = dhms(test_date, h, m, 0) format date_time %tc gen double test_time = date_time - cofd(test_date) format test_time %tcHH:MM

The variable pure_time reflects the time of day and is suitable for use in regression analyses or other calculations. That said, -reg score test_time- doesn't seem like a plausible model for most tests. That model implies that as the day progresses, the scores increase linearly. Then when we hit midnight and start the new day, they drop down to the "wee hours of the morning" values and begin their rise over the day again. I can't think of anything that really works that way off the top of my head. Usually these things involve the use of smooth periodic functions like sin() and cos() with appropriate parameters.
Comment

Announcement

Generate a unified time based on separate variables for date and time

Comment

Comment

Comment