Check for seasonality in hospital data

Julien Weinstein

Join Date: May 2023
Posts: 9

Check for seasonality in hospital data

02 Jun 2023, 13:36

My task is to check if seasonal variations exist in this dataset of hospital admissions/discharges. The columns start_ and end_datetimes refer to the admission and discharge times for each patient. I am not sure how to check for seasonality in Stata, as I'm fairly new to the language. Perhaps it would make sense to make a time-series line plot of admissions through time? Would I need to convert the dataset to a time-series using tsset? Please let me know what you think. Here is a sample of my data

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id double(start_datetime end_datetime nicu_length_of_stay)
19040329 1577817919000 1578062267000  67.87444444444445
21888934 1585386262000 1585572412000 51.708333333333336
21889607 1592299141000 1594518629000  616.5244444444445
21889636 1593534616000 1593536696000  .5777777777777777
21889757 1599879496000 1600265733000 107.28805555555556
21889821 1601829782000 1601998078000  46.74888888888889
21891579 1628549623000 1628708634000  44.16972222222222
21892676 1639685261000 1639771746000 24.023611111111112
21892736 1640042628000 1640088653000 12.784722222222221
42792024 1547400884000 1547475142000 20.627222222222223
end
format %tc start_datetime
format %tc end_datetime

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

02 Jun 2023, 14:05

Well, seasonality is a generic term, and does not necessarily refer to the four seasons of the year. Which is good, in your case, because that is a fairly coarse way of parsing time for the purpose of studying NICU admissions. I would recommend looking at the month level of seasonality. For that, first you need a data set that includes all the NICU admissions during some period that is a whole number of calendar years. (That is, you don't want a data set that begins in, say, January 2020 and ends in March 2021, because then January through March will be represented twice and the other months of the year only once.)

Code:

gen admit_month = month(dofc(start_datetime)) tab admit_month

If there is substantial seasonality, it should pop out and hit you between the eyes in the tabulation of admissions by month. While you could, in principle, test the distribution against a uniform distribution across the months (or, actually, in proportion to the number of days in each month), I don't recommend that. In this context, any degree of seasonality that would be large enough to matter for practical purposes should be visually obvious, and a small seasonality only detectable with a statistical test is probably of no importance to anyone.

If I have that wrong, and you really need to detect subtle seasonality, then the simplest way is probably this:

Code:

contract admit_month gen month_length = 30 if inlist(admit_month, 4, 6, 9, 11) replace month_length = 31 if inlist(admit_month, 1, 3, 5, 7, 8, 10, 12) replace month_length = 28 if admit_month == 2 poisson _freq i.admit_month, exposure(month_length) testparm i.admit_month, equal
Comment
Julien Weinstein

Join Date: May 2023

Posts: 9
#3

02 Jun 2023, 14:33

Thanks again, Clyde! Really helpful answer, and I will test out your suggestions next week. I'll let you know if I have any further questions/thoughts at that point.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

02 Jun 2023, 16:37

For some more ideas

Code:

. search seasonality 

Search of official help files, FAQs, Examples, and Stata Journals

[TS]    tsfilter  . . . . . . . . Filter a time series for cyclical components
        (help tsfilter)

[TS]    tsfilter bk . . . . . . . . . . . . . . Baxter-King time-series filter
        (help tsfilter bk)

[TS]    tsfilter bw . . . . . . . . . . . . . . Butterworth time-series filter
        (help tsfilter bw)

[TS]    tsfilter cf . . . . . . . . . Christiano-Fitzgerald time-series filter
        (help tsfilter cf)

[TS]    tsfilter hp . . . . . . . . . . .  Hodrick-Prescott time-series filter
        (help tsfilter hp)

[TS]    ucm . . . . . . . . . . . . . . . . . . .  Unobserved-components model
        (help ucm)

NC461   . . . . . . . . . . . NetCourse 461: Univariate time series with Stata
        http://www.stata.com/netcourse/univariate-time-series-intro-nc461/

SJ-12-2 st0255  . . . . .  Menu-driven X-12-ARIMA seasonal adjustment in Stata
        . . . . . . . . . . . . . . . . . . . . . . . . . .  Q. Wang and N. Wu
        (help sax12del, sax12diag, sax12im, sax12 if installed)
        Q2/12   SJ 12(2):214--241
        provides menu-driven X-12-ARIMA seasonal-adjustment

SJ-9-3  st0172  . . . . . . . . . . . . . A seasonal unit-root test with Stata
        (help sroot if installed) . . . . . . . . . . . . . . . . .  D. Depalo
        Q3/09   SJ 9(3):422--438
        advocates using original data instead of seasonally
        adjusted data; implements a test for unit roots in
        quarterly data

SJ-9-2  gr0037  . . . . . . . .  Stata tip 76: Separating seasonal time series
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q2/09   SJ 9(2):321--326                                 (no commands)
        tip on separating seasonal time series

SJ-6-3  gr0025  . . . . . . . . . . . . Speaking Stata: Graphs for all seasons
        (help cycleplot, sliceplot if installed)  . . . . . . . . .  N. J. Cox
        Q3/06   SJ 6(3):397--419
        illustrates producing graphs showing time-series seasonality

SJ-5-3  sg149_1 . . . . . . . . . . . . . . . . . .  Software update for seast
        (help seast if installed) . . . . . . .  M. S. Pearce and R. Feltbower
        Q3/05   SJ 5(3):470
        bug fix for seast; help file and program also modernized

Comment

Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#5

02 Jun 2023, 19:05

In addition to the above helpful comments and cites, here are 2 additional sites:

Barnett, AG and Dobson, AJ (2010), Analysing Seasonal Health Data, Springer

Madaniyazi, L, et al. (2022), "Assessing seasonality and the role of its potential drivers in environmental epidemiology," International Journal of Epidemiology, 51(5): 1677-1686

added in edit: while your data example does not contain any leap years, your real data might and that would call for a minor change to Clyde Schechter 's code

Last edited by Rich Goldstein; 02 Jun 2023, 19:29.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

03 Jun 2023, 01:29

While months are familiar and often convenient. I often work with fraction of the year, as say below.

Data like #1 could show a range of effects

time of day

day of week

major holidays etc.

seasons strict sense (direct or indirect effects of climate or weather)

pandemic effects (over varying time lengths, as we know)

longer-term trends

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id double(start_datetime end_datetime nicu_length_of_stay)
19040329 1577817919000 1578062267000  67.87444444444445
21888934 1585386262000 1585572412000 51.708333333333336
21889607 1592299141000 1594518629000  616.5244444444445
21889636 1593534616000 1593536696000  .5777777777777777
21889757 1599879496000 1600265733000 107.28805555555556
21889821 1601829782000 1601998078000  46.74888888888889
21891579 1628549623000 1628708634000  44.16972222222222
21892676 1639685261000 1639771746000 24.023611111111112
21892736 1640042628000 1640088653000 12.784722222222221
42792024 1547400884000 1547475142000 20.627222222222223
end
format %tc start_datetime
format %tc end_datetime

gen ddate = dofc(start_datetime)
format ddate %td 
gen ydate = year(ddate)

gen foy = (doy(ddate) - 0.5) / (365 + (mdy(2, 29, ydate) < .))
format foy %5.4f 

l *date* foy , sep(0)

     +----------------------------------------------------------------------+
     |     start_datetime         end_datetime       ddate   ydate      foy |
     |----------------------------------------------------------------------|
  1. | 30dec2009 18:45:19   02jan2010 14:37:47   30dec2009    2009   0.9959 |
  2. | 28mar2010 09:04:22   30mar2010 12:46:52   28mar2010    2010   0.2370 |
  3. | 16jun2010 09:19:01   12jul2010 01:50:29   16jun2010    2010   0.4562 |
  4. | 30jun2010 16:30:16   30jun2010 17:04:56   30jun2010    2010   0.4945 |
  5. | 12sep2010 02:58:16   16sep2010 14:15:33   12sep2010    2010   0.6973 |
  6. | 04oct2010 16:43:02   06oct2010 15:27:58   04oct2010    2010   0.7575 |
  7. | 09aug2011 22:53:43   11aug2011 19:03:54   09aug2011    2011   0.6041 |
  8. | 16dec2011 20:07:41   17dec2011 20:09:06   16dec2011    2011   0.9575 |
  9. | 20dec2011 23:23:48   21dec2011 12:10:53   20dec2011    2011   0.9685 |
 10. | 12jan2009 17:34:44   13jan2009 14:12:22   12jan2009    2009   0.0315 |
     +----------------------------------------------------------------------+

Comment

Julien Weinstein

Join Date: May 2023

Posts: 9
#7

05 Jun 2023, 07:57

Thank you, Nick. Once I have foy, would I be able to run models on that column to check for seasonality, like say an ARIMA, or what would be the next steps?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#8

05 Jun 2023, 09:30

How would ARIMA apply here? You have individuals entering and exiting at different times. I think you'd need to reduce your data to counts in time bins or at time intervals, and that would be a long stretch from your data.

Your question

check if seasonal variations exist in this dataset of hospital admissions/discharges

was rather general and it's hard to do more than make a variety of general suggestions, beyond noting that many different time scales are involved and wondering what is of most interest and importance.
Comment
Julien Weinstein

Join Date: May 2023

Posts: 9
#9

05 Jun 2023, 12:55

Sorry, I guess my question more directly would be: how can I use the foy variable to help me understand seasonal trends in patient admissions? Additionally, how could I make a lineplot that shows number of admissions month-to-month throughout the data set (goes from Jan. 2009-Dec. 2011).

Last edited by Julien Weinstein; 05 Jun 2023, 12:58.
Comment
Julien Weinstein

Join Date: May 2023

Posts: 9
#10

05 Jun 2023, 14:23

I have generated a time-series based line plot by summing the monthly admissions and plotting them against the months in the data set. I'd like to check for seasonality in the data from this summed monthly time-series data. Is there a good way to detect trends or seasonality in this type of data in Stata?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#11

06 Jun 2023, 01:10

That makes your problem clearer -- thanks.

It seems that you have reduced your problem to 36 values for the years 2009 to 2011, so no leap years. You can't use monthly date as a predictor without grossly overfitting.

You may have month of the year 1 ... 12 as an existing variable and year as an existing variable; if not then recipes are

Code:

gen month = month(dofm(mdate)) gen year = year(dofm(mdate))

As already hinted by Clyde Schechter in #2 if you use month of year as a predictor you are mixing a genuine time of year effect with an artefact of length of month. So Clyde's code is good for non-leap years or

Code:

gen length = daysinmonth(dofm(mdate))

will get you that predictor. Then it's a Poisson regression

Code:

poisson _freq i.month length

or

Code:

poisson _freq i.month length i.year

There is a still a risk of overfitting. If you really have more data, that's fine. If the reason the data stop in 2011 is that this is some sort of assignment, then your teachers should be aware of the issues.

It's my impression that sine and cosine predictors don't work especially well for this kind of data.
Comment

Announcement

Check for seasonality in hospital data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment