How to round date variables to the nearest Monday?

Rachel Inafuku

Join Date: Dec 2020
Posts: 3

How to round date variables to the nearest Monday?

18 Dec 2020, 11:44

Hi everyone,

I have a dataset with student test scores and test dates for the years 2008-2019. I have all student test scores but I don't have their test dates for the years 2008 to 2014. The data looks something like this:

student id	test_score	test_date	year	school
1	90	.	2010	100
1	98	.	2011	100
1	95	apr 30 2015	2015	100
1	94	may 1 2016	2016	100
2	88	.	2013	200
2	88	.	2014	200
2	89	may 3 2015	2015	200
2	90	apr 22 2017	2017	200
3	70	.	2011	100
3	75	apr 30 2015	2015	100

I am imputing these missing test dates by taking the average test date of the non-missing test dates by school but some of these average test dates end up being on a Saturday or Sunday. I know that most students take their tests on a Monday. I would like to round the imputed test dates to the nearest Monday. Does anyone know how to do that? Here is what my code looks like so far:

gen day = day(test_date)
gen month = month(test_date)

egen day_mean = mean(day), by(school)
egen month_mean = mean(month), by(school)

gen date_imp = mdy(month_mean, day_mean, year)

***insert code to round to nearest monday here!!!

replace test_date = date_imp if test_date == .

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

18 Dec 2020, 12:30

Code:

clear input byte(studentid test_score) int(year school) float test_date 1 90 2010 100 . 1 98 2011 100 . 1 95 2015 100 20208 1 94 2016 100 20575 2 88 2013 200 . 2 88 2014 200 . 2 89 2015 200 20211 2 90 2017 200 20931 3 70 2011 100 . 3 75 2015 100 20208 end format %td test_date // IMPUTE SCHOOL'S MEAN TEST DATE FOR MISSING VALUES by school, sort: egen school_mean_test_day_of_year = mean(doy(test_date)) replace school_mean_test_day_of_year = round(school_mean_test_day_of_year) gen school_mean_test_date = mdy(1, 1, year) + school_mean_test_day_of_year gen imputed_test_date = test_date replace imputed_test_date = school_mean_test_date if missing(test_date) format imputed_test_date %td // NOW SWITCH TO NEAREST MONDAY gen weekday = dow(imputed_test_date) replace imputed_test_date = imputed_test_date - weekday /// if weekday <= 4 replace imputed_test_date = imputed_test_date + 7 - weekday /// if weekday > 4 & !missing(weekday) replace imputed_test_date = imputed_test_date + 1 assert dow(imputed_test_date) == 1 format imputed_test_date %td

The detour through Sunday makes the code slightly easier to write because Stata's dow() function has Sunday = 0.

By the way, your code for calculating the mean date does not actually calculate the mean date. It calculates the mean day of the month and the mean month of the year and then puts those together, but that is not quite the same thing. For practical purposes it may not make much difference, but I've shown code that actually gets the mean date.

Next, there is the question of whether you should do this. I don't know what you are going to use these dates for, but for many purposes, imputing a mean value to replace missing values is not a good idea. This is particularly true if you will be using these dates as a variable in a regression model or the equivalent. Unfortunately, when it comes to dealing with missing data, we are usually in a position of having to choose the least bad solution, rather than a good one. But, again, especially for regression, multiple imputation would generally be better than single mean imputation.

Finally, in the future, when showing data examples, please use the -dataex- command to do so. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Last edited by Clyde Schechter; 18 Dec 2020, 12:44. Reason: For some reason, this post posted itself before I was finished writing it!
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30097

18 Dec 2020, 14:17

I just noticed an error in the code I posted in #2.

Code:

gen school_mean_test_date = mdy(1,1,year) + school_mean_test_day_of_year

// SHOULD BE

gen school_mean_test_date = mdy(12,31,year-1) + school_mean_test_day_of_year

// BECAUSE THE doy() FUNCTION STARTS WITH 1, NOT ZERO

Comment

Rachel Inafuku

Join Date: Dec 2020

Posts: 3
#4

21 Dec 2020, 11:32

Thank you Clyde! Sorry, this was my first post on the stata forum. I'll be sure to use the dataex command next time.
Comment

Announcement

How to round date variables to the nearest Monday?

Comment

Comment

Comment