Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Check for seasonality in hospital data

    My task is to check if seasonal variations exist in this dataset of hospital admissions/discharges. The columns start_ and end_datetimes refer to the admission and discharge times for each patient. I am not sure how to check for seasonality in Stata, as I'm fairly new to the language. Perhaps it would make sense to make a time-series line plot of admissions through time? Would I need to convert the dataset to a time-series using tsset? Please let me know what you think. Here is a sample of my data
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id double(start_datetime end_datetime nicu_length_of_stay)
    19040329 1577817919000 1578062267000  67.87444444444445
    21888934 1585386262000 1585572412000 51.708333333333336
    21889607 1592299141000 1594518629000  616.5244444444445
    21889636 1593534616000 1593536696000  .5777777777777777
    21889757 1599879496000 1600265733000 107.28805555555556
    21889821 1601829782000 1601998078000  46.74888888888889
    21891579 1628549623000 1628708634000  44.16972222222222
    21892676 1639685261000 1639771746000 24.023611111111112
    21892736 1640042628000 1640088653000 12.784722222222221
    42792024 1547400884000 1547475142000 20.627222222222223
    end
    format %tc start_datetime
    format %tc end_datetime

  • #2
    Well, seasonality is a generic term, and does not necessarily refer to the four seasons of the year. Which is good, in your case, because that is a fairly coarse way of parsing time for the purpose of studying NICU admissions. I would recommend looking at the month level of seasonality. For that, first you need a data set that includes all the NICU admissions during some period that is a whole number of calendar years. (That is, you don't want a data set that begins in, say, January 2020 and ends in March 2021, because then January through March will be represented twice and the other months of the year only once.)

    Code:
    gen admit_month = month(dofc(start_datetime))
    
    tab admit_month
    If there is substantial seasonality, it should pop out and hit you between the eyes in the tabulation of admissions by month. While you could, in principle, test the distribution against a uniform distribution across the months (or, actually, in proportion to the number of days in each month), I don't recommend that. In this context, any degree of seasonality that would be large enough to matter for practical purposes should be visually obvious, and a small seasonality only detectable with a statistical test is probably of no importance to anyone.

    If I have that wrong, and you really need to detect subtle seasonality, then the simplest way is probably this:
    Code:
    contract admit_month
    gen month_length = 30 if inlist(admit_month, 4, 6, 9, 11)
    replace month_length = 31 if inlist(admit_month, 1, 3, 5, 7, 8, 10, 12)
    replace month_length = 28 if admit_month == 2
    poisson _freq i.admit_month, exposure(month_length)
    testparm i.admit_month, equal

    Comment


    • #3
      Thanks again, Clyde! Really helpful answer, and I will test out your suggestions next week. I'll let you know if I have any further questions/thoughts at that point.

      Comment


      • #4
        For some more ideas


        Code:
        . search seasonality 
        
        Search of official help files, FAQs, Examples, and Stata Journals
        
        [TS]    tsfilter  . . . . . . . . Filter a time series for cyclical components
                (help tsfilter)
        
        [TS]    tsfilter bk . . . . . . . . . . . . . . Baxter-King time-series filter
                (help tsfilter bk)
        
        [TS]    tsfilter bw . . . . . . . . . . . . . . Butterworth time-series filter
                (help tsfilter bw)
        
        [TS]    tsfilter cf . . . . . . . . . Christiano-Fitzgerald time-series filter
                (help tsfilter cf)
        
        [TS]    tsfilter hp . . . . . . . . . . .  Hodrick-Prescott time-series filter
                (help tsfilter hp)
        
        [TS]    ucm . . . . . . . . . . . . . . . . . . .  Unobserved-components model
                (help ucm)
        
        NC461   . . . . . . . . . . . NetCourse 461: Univariate time series with Stata
                http://www.stata.com/netcourse/univariate-time-series-intro-nc461/
        
        SJ-12-2 st0255  . . . . .  Menu-driven X-12-ARIMA seasonal adjustment in Stata
                . . . . . . . . . . . . . . . . . . . . . . . . . .  Q. Wang and N. Wu
                (help sax12del, sax12diag, sax12im, sax12 if installed)
                Q2/12   SJ 12(2):214--241
                provides menu-driven X-12-ARIMA seasonal-adjustment
        
        SJ-9-3  st0172  . . . . . . . . . . . . . A seasonal unit-root test with Stata
                (help sroot if installed) . . . . . . . . . . . . . . . . .  D. Depalo
                Q3/09   SJ 9(3):422--438
                advocates using original data instead of seasonally
                adjusted data; implements a test for unit roots in
                quarterly data
        
        SJ-9-2  gr0037  . . . . . . . .  Stata tip 76: Separating seasonal time series
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q2/09   SJ 9(2):321--326                                 (no commands)
                tip on separating seasonal time series
        
        SJ-6-3  gr0025  . . . . . . . . . . . . Speaking Stata: Graphs for all seasons
                (help cycleplot, sliceplot if installed)  . . . . . . . . .  N. J. Cox
                Q3/06   SJ 6(3):397--419
                illustrates producing graphs showing time-series seasonality
        
        SJ-5-3  sg149_1 . . . . . . . . . . . . . . . . . .  Software update for seast
                (help seast if installed) . . . . . . .  M. S. Pearce and R. Feltbower
                Q3/05   SJ 5(3):470
                bug fix for seast; help file and program also modernized

        Comment


        • #5
          In addition to the above helpful comments and cites, here are 2 additional sites:

          Barnett, AG and Dobson, AJ (2010), Analysing Seasonal Health Data, Springer

          Madaniyazi, L, et al. (2022), "Assessing seasonality and the role of its potential drivers in environmental epidemiology," International Journal of Epidemiology, 51(5): 1677-1686

          added in edit: while your data example does not contain any leap years, your real data might and that would call for a minor change to Clyde Schechter 's code
          Last edited by Rich Goldstein; 02 Jun 2023, 19:29.

          Comment


          • #6
            While months are familiar and often convenient. I often work with fraction of the year, as say below.

            Data like #1 could show a range of effects

            time of day

            day of week

            major holidays etc.

            seasons strict sense (direct or indirect effects of climate or weather)

            pandemic effects (over varying time lengths, as we know)

            longer-term trends

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long id double(start_datetime end_datetime nicu_length_of_stay)
            19040329 1577817919000 1578062267000  67.87444444444445
            21888934 1585386262000 1585572412000 51.708333333333336
            21889607 1592299141000 1594518629000  616.5244444444445
            21889636 1593534616000 1593536696000  .5777777777777777
            21889757 1599879496000 1600265733000 107.28805555555556
            21889821 1601829782000 1601998078000  46.74888888888889
            21891579 1628549623000 1628708634000  44.16972222222222
            21892676 1639685261000 1639771746000 24.023611111111112
            21892736 1640042628000 1640088653000 12.784722222222221
            42792024 1547400884000 1547475142000 20.627222222222223
            end
            format %tc start_datetime
            format %tc end_datetime
            
            gen ddate = dofc(start_datetime)
            format ddate %td 
            gen ydate = year(ddate)
            
            gen foy = (doy(ddate) - 0.5) / (365 + (mdy(2, 29, ydate) < .))
            format foy %5.4f 
            
            l *date* foy , sep(0)
            
                 +----------------------------------------------------------------------+
                 |     start_datetime         end_datetime       ddate   ydate      foy |
                 |----------------------------------------------------------------------|
              1. | 30dec2009 18:45:19   02jan2010 14:37:47   30dec2009    2009   0.9959 |
              2. | 28mar2010 09:04:22   30mar2010 12:46:52   28mar2010    2010   0.2370 |
              3. | 16jun2010 09:19:01   12jul2010 01:50:29   16jun2010    2010   0.4562 |
              4. | 30jun2010 16:30:16   30jun2010 17:04:56   30jun2010    2010   0.4945 |
              5. | 12sep2010 02:58:16   16sep2010 14:15:33   12sep2010    2010   0.6973 |
              6. | 04oct2010 16:43:02   06oct2010 15:27:58   04oct2010    2010   0.7575 |
              7. | 09aug2011 22:53:43   11aug2011 19:03:54   09aug2011    2011   0.6041 |
              8. | 16dec2011 20:07:41   17dec2011 20:09:06   16dec2011    2011   0.9575 |
              9. | 20dec2011 23:23:48   21dec2011 12:10:53   20dec2011    2011   0.9685 |
             10. | 12jan2009 17:34:44   13jan2009 14:12:22   12jan2009    2009   0.0315 |
                 +----------------------------------------------------------------------+

            Comment


            • #7
              Thank you, Nick. Once I have foy, would I be able to run models on that column to check for seasonality, like say an ARIMA, or what would be the next steps?

              Comment


              • #8
                How would ARIMA apply here? You have individuals entering and exiting at different times. I think you'd need to reduce your data to counts in time bins or at time intervals, and that would be a long stretch from your data.

                Your question

                check if seasonal variations exist in this dataset of hospital admissions/discharges
                was rather general and it's hard to do more than make a variety of general suggestions, beyond noting that many different time scales are involved and wondering what is of most interest and importance.

                Comment


                • #9
                  Sorry, I guess my question more directly would be: how can I use the foy variable to help me understand seasonal trends in patient admissions? Additionally, how could I make a lineplot that shows number of admissions month-to-month throughout the data set (goes from Jan. 2009-Dec. 2011).
                  Last edited by Julien Weinstein; 05 Jun 2023, 12:58.

                  Comment


                  • #10
                    I have generated a time-series based line plot by summing the monthly admissions and plotting them against the months in the data set. I'd like to check for seasonality in the data from this summed monthly time-series data. Is there a good way to detect trends or seasonality in this type of data in Stata?
                    Click image for larger version

Name:	admissions.png
Views:	1
Size:	79.7 KB
ID:	1716141

                    Comment


                    • #11
                      That makes your problem clearer -- thanks.

                      It seems that you have reduced your problem to 36 values for the years 2009 to 2011, so no leap years. You can't use monthly date as a predictor without grossly overfitting.

                      You may have month of the year 1 ... 12 as an existing variable and year as an existing variable; if not then recipes are

                      Code:
                      gen month = month(dofm(mdate)) 
                      gen year = year(dofm(mdate))
                      As already hinted by Clyde Schechter in #2 if you use month of year as a predictor you are mixing a genuine time of year effect with an artefact of length of month. So Clyde's code is good for non-leap years or

                      Code:
                      gen length = daysinmonth(dofm(mdate))
                      will get you that predictor. Then it's a Poisson regression


                      Code:
                      poisson _freq i.month length 
                      or

                      Code:
                      poisson _freq i.month length i.year
                      There is a still a risk of overfitting. If you really have more data, that's fine. If the reason the data stop in 2011 is that this is some sort of assignment, then your teachers should be aware of the issues.

                      It's my impression that sine and cosine predictors don't work especially well for this kind of data.

                      Comment

                      Working...
                      X