Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find monthly variation in number of cases per weekday over a period of one year

    Hello:
    I work in health care field and I am trying to assess the variation in number of cases during the weekdays, to adjust the staffing needed on a particular week day(Excluding Saturday and Sunday).

    I have the patient's identifiers, and the date when the procedure was performed for one complete year (2023).

    Using the date and time function, and working with string variables, I was able to tabulate number of cases performed on each day of the week (Monday to Friday, excluding Saturday and Sunday).

    Is there a way to find:
    1) If there is any monthly variation in number of cases performed during each week day.
    2) Is there a way to assess if the distribution of cases (during the week days) for the entire year reflective of distribution of cases for each month

    I would really appreciate any suggestions/recommendations.

    Thank you again for your time
    Sincerely
    Mike

  • #2
    1) Yes.
    2) Not exactly, but you can contrast each month's distribution with that for the rest of the year. (You can also trick Stata into contrasting each month with the full year, but the results you get will not be statistically valid.)

    I realize that you weren't really asking yes-no questions: you would like advice on how to do these things. For that, we need to know what your data is and how it is organized. You have made a valiant effort to describe it. But I can think of several different organizations that are all consistent with what you have said, and the approach I would take to those differs. It's not a matter of trying harder to be clear: words are not really up to the task of describing data sets, at least not at the level of detail required for developing code.

    Please post back with example data. And to make it feasible for others to try to develop a solution to your problem using your example, use the -dataex- command to do that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Maybe something like this:

      Code:
      clear all
      set obs 12
      g month = _n
      expand 4
      bys month: g week = _n
      expand 5
      bys month week: g day = _n
      bys day: g cases = rpoisson(30)
      sort month week day
      replace cases = cases + rpoisson(5)
      replace cases = cases + rpoisson(10) if inlist(day,5)
      replace cases = cases + rpoisson(15) if (inlist(month,6) & inlist(day,5))
      
      poisson cases i.day , r
      poisson cases i.month , r
      
      ** adjust for month to get day coefficient
      poisson cases i.day i.month , r

      Comment


      • #4
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(MRN date) byte day_of_the_month float month_numeric str9(day_name month)
          79.6401 23371 27 3 "WEDNESDAY" "DECEMBER" 
         91.88837 23022 12 4 "THURSDAY"  "JANUARY"  
        22.673725 23138  8 1 "MONDAY"    "MAY"      
        25.070234 23180 19 1 "MONDAY"    "JUNE"     
        33.577644 23102  2 0 "SUNDAY"    "APRIL"    
         80.57185 23089 20 1 "MONDAY"    "MARCH"    
         80.93733 23308 25 3 "WEDNESDAY" "OCTOBER"  
        14.918642 23076  7 2 "TUESDAY"   "MARCH"    
        14.918642 23073  4 6 "SATURDAY"  "MARCH"    
         72.22513 23180 19 1 "MONDAY"    "JUNE"     
         90.45408 23074  5 0 "SUNDAY"    "MARCH"    
         90.45408 23076  7 2 "TUESDAY"   "MARCH"    
         54.88587 23357 13 3 "WEDNESDAY" "DECEMBER" 
         54.88587 23299 16 1 "MONDAY"    "OCTOBER"  
         85.23929 23353  9 6 "SATURDAY"  "DECEMBER" 
         93.73267 23254  1 5 "FRIDAY"    "SEPTEMBER"
         44.25417 23208 17 1 "MONDAY"    "JULY"     
         59.85971 23053 12 0 "SUNDAY"    "FEBRUARY" 
         93.14678 23113 13 4 "THURSDAY"  "APRIL"    
         77.97056 23233 11 5 "FRIDAY"    "AUGUST"   
         97.22269 23186 25 0 "SUNDAY"    "JUNE"     
         97.22269 23096 27 1 "MONDAY"    "MARCH"    
        18.691105 23373 29 5 "FRIDAY"    "DECEMBER" 
         7.164299 23075  6 1 "MONDAY"    "MARCH"    
         7.164299 23077  8 3 "WEDNESDAY" "MARCH"    
         45.69116 23241 19 6 "SATURDAY"  "AUGUST"   
         22.20176 23243 21 1 "MONDAY"    "AUGUST"   
         53.65619 23090 21 2 "TUESDAY"   "MARCH"    
        14.491908 23181 20 2 "TUESDAY"   "JUNE"     
         32.90934 23334 20 1 "MONDAY"    "NOVEMBER" 
        34.283268 23220 29 6 "SATURDAY"  "JULY"     
         79.17989 23299 16 1 "MONDAY"    "OCTOBER"  
          57.8358 23281 28 4 "THURSDAY"  "SEPTEMBER"
         82.57027 23196  5 3 "WEDNESDAY" "JULY"     
         77.97834 23285  2 1 "MONDAY"    "OCTOBER"  
         53.87986 23304 21 6 "SATURDAY"  "OCTOBER"  
         24.30388 23268 15 5 "FRIDAY"    "SEPTEMBER"
         95.46659 23366 22 5 "FRIDAY"    "DECEMBER" 
         72.55294 23110 10 1 "MONDAY"    "APRIL"    
         44.21449 23192  1 6 "SATURDAY"  "JULY"     
         11.54296 23055 14 2 "TUESDAY"   "FEBRUARY" 
         39.48013 23073  4 6 "SATURDAY"  "MARCH"    
         50.70106 23316  2 4 "THURSDAY"  "NOVEMBER" 
        15.809456 23011  1 0 "SUNDAY"    "JANUARY"  
        15.809456 23013  3 2 "TUESDAY"   "JANUARY"  
         41.45922 23031 21 6 "SATURDAY"  "JANUARY"  
          96.4711 23145 15 1 "MONDAY"    "MAY"      
        71.700775 23152 22 1 "MONDAY"    "MAY"      
         45.37658 23304 21 6 "SATURDAY"  "OCTOBER"  
         50.77076 23304 21 6 "SATURDAY"  "OCTOBER"  
         81.46998 23343 29 3 "WEDNESDAY" "NOVEMBER" 
         73.93321 23237 15 2 "TUESDAY"   "AUGUST"   
         73.93321 23209 18 2 "TUESDAY"   "JULY"     
         73.93321 23228  6 0 "SUNDAY"    "AUGUST"   
         43.45839 23276 23 6 "SATURDAY"  "SEPTEMBER"
         43.45839 23278 25 1 "MONDAY"    "SEPTEMBER"
         57.77544 23190 29 4 "THURSDAY"  "JUNE"     
         87.79718 23181 20 2 "TUESDAY"   "JUNE"     
           50.924 23091 22 3 "WEDNESDAY" "MARCH"    
        24.498253 23040 30 1 "MONDAY"    "JANUARY"  
        21.397655 23212 21 5 "FRIDAY"    "JULY"     
         26.63715 23280 27 3 "WEDNESDAY" "SEPTEMBER"
         27.30096 23274 21 4 "THURSDAY"  "SEPTEMBER"
          38.3834 23050  9 4 "THURSDAY"  "FEBRUARY" 
         59.47316 23025 15 0 "SUNDAY"    "JANUARY"  
         68.43225 23097 28 2 "TUESDAY"   "MARCH"    
         83.58498 23277 24 0 "SUNDAY"    "SEPTEMBER"
         9.311325 23232 10 4 "THURSDAY"  "AUGUST"   
         28.73538 23262  9 6 "SATURDAY"  "SEPTEMBER"
        26.435064 23265 12 2 "TUESDAY"   "SEPTEMBER"
         72.83545 23226  4 5 "FRIDAY"    "AUGUST"   
         68.47608 23034 24 2 "TUESDAY"   "JANUARY"  
         12.03723 23094 25 6 "SATURDAY"  "MARCH"    
         25.20479 23131  1 1 "MONDAY"    "MAY"      
         29.61858 23090 21 2 "TUESDAY"   "MARCH"    
         29.61858 23123 23 0 "SUNDAY"    "APRIL"    
         66.06802 23117 17 1 "MONDAY"    "APRIL"    
         16.03548 23162  1 4 "THURSDAY"  "JUNE"     
        22.225714 23275 22 5 "FRIDAY"    "SEPTEMBER"
        36.273273 23205 14 5 "FRIDAY"    "JULY"     
         83.88499 23351  7 4 "THURSDAY"  "DECEMBER" 
         89.34456 23163  2 5 "FRIDAY"    "JUNE"     
         99.71127 23011  1 0 "SUNDAY"    "JANUARY"  
        73.778915 23174 13 2 "TUESDAY"   "JUNE"     
         63.92961 23191 30 5 "FRIDAY"    "JUNE"     
         97.60345 23363 19 2 "TUESDAY"   "DECEMBER" 
         60.02395 23033 23 1 "MONDAY"    "JANUARY"  
           85.925 23034 24 2 "TUESDAY"   "JANUARY"  
         74.58229 23086 17 5 "FRIDAY"    "MARCH"    
         70.47499 23060 19 0 "SUNDAY"    "FEBRUARY" 
         5.057697 23321  7 2 "TUESDAY"   "NOVEMBER" 
        1.9059875 23274 21 4 "THURSDAY"  "SEPTEMBER"
         49.42278 23089 20 1 "MONDAY"    "MARCH"    
         5.152604 23120 20 4 "THURSDAY"  "APRIL"    
         39.92551 23236 14 1 "MONDAY"    "AUGUST"   
         96.68467 23238 16 3 "WEDNESDAY" "AUGUST"   
         88.48962 23054 13 1 "MONDAY"    "FEBRUARY" 
        33.315968 23094 25 6 "SATURDAY"  "MARCH"    
        1.2098415 23251 29 2 "TUESDAY"   "AUGUST"   
         83.32545 23282 29 5 "FRIDAY"    "SEPTEMBER"
        end
        format %td date

        Comment


        • #5
          Hello all:

          Apologies as I did not post using the dataex
          Here is the random dataset that I generated with random numbers and dates
          I used dataex to assist with posting the data in this forum.

          Just to clarify: I have intentionally repeated some random numbers as some of the patients can come back for repeat procedures

          Is there a way to find:
          1) If there is any monthly variation in number of cases performed during each week day.
          2) Is there a way to assess if the distribution of cases (during the week days) for the entire year reflective of distribution of cases for each month
          3) Is there a graphical way to represent this data

          Thank you again for your time

          Comment


          • #6
            OK. It will go a bit better if we -encode- the day_name and month variables. Also, in order for the graphs to be readable, it is best if the names of the weekdays are shortened to their three letter abbreviations, lest the axis labels overwrite each other.

            There are many ways in which one might compare distributions across months. I have chosen to use a cross tab of months and weekdays and calculate the chi2 statistic. As for comparing each month with the entire year, that is not something that one does with statistics: you compare the month with all the other months. An attempt to compare the month with the entire year would produce invalid results because the month is part of the year, so there would be a hidden dependency between the two groups being contrasted, violating the assumption of independence that underlies the tests.

            Code:
            //    CLEAN UP THE DATA A BIT
            drop if inlist(day_name, "SATURDAY", "SUNDAY")
            replace day_name = substr(proper(day_name), 1, 3)
            label define day_name    1    "Mon"    ///
                                    2    "Tue"    ///
                                    3    "Wed"    ///
                                    4    "Thu"    ///
                                    5    "Fri"
            encode day_name, gen(_day_name) label(day_name)
            drop day_name
            rename _day_name day_name
            
            replace month = substr(proper(month), 1, 3)
            forvalues i = 1/12 {
                label define month    `i'    "`:word `i' of `c(Mons)''", add
            }
            encode month, gen(_month) label(month)
            drop month
            rename _month month
            
            //    AGGREGATE NUMBER OF CASES EACH WEEKDAY IN EACH MONTH
            by month day_name, sort: egen total_cases = total(month_numeric)
            egen tag = tag(month day_name)
            
            //    QUESTION 1
            tab month day_name if tag [fweight=total_cases], chi2 row
            
            //    QUESTION 2
            levelsof month, local(months)
            foreach m of local months {
                gen byte is_`m' = (month == `m')
                tab is_`m' day_name if tag [fweight = total_cases], chi2 row
            }
            
            //    QUESTION 3
            graph bar total_cases if tag, over(day_name) by(month)
            Added: I should add the comparison of the various months with each other or with the rest of the year has an inherent flaw if a single year's data is all that will be used. That's because in any given year, certain months have 5 Mondays whereas most have only 4. (Monday is just an example--any day of the week could be involved in this.) So in a month with an extra Monday, the probability of a case being on a Monday will be greater in those months that have more Mondays. And if your sample has a large number of cases, the tests will be sensitive to this difference, which may not be what you want. If you have many years of data, then the shifts in the yearly calendars due to a year not being a whole number of weeks will smooth this out and mitigate this problem.

            Last edited by Clyde Schechter; 17 Aug 2024, 20:57.

            Comment

            Working...
            X