Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Grouping by days and time

    Hi. I have data on hospital visits for the period of Oct 1, 2022 - Feb 28, 2023. The visits have the date of visit (checkin_date) and time of visit. Time of visit is captured by two variables - appt_time and day_time. appt_time follows a 12 hour clock and day_time uses AM/PM - to together indicate the exact time of visit

    I want to do two things:
    1. Generate a week variable where Oct 1-7 is week 1, Oct 8-14 is week 2 and so on. However, there may be missing days in the checkin_date variable and I don't want days skipped in the week variable, i.e. Oct 1-7 has to be week 1 even if there is Oct 5 missing in the data.
    2. Generate a time of day variable where appointments are categorized as "morning" if before 12 pm, "afternoon" if between 12-4 pm, and "evening" if after 4 pm.

    I've been struggling on these two and would very much appreciate any guidance here.

    Code:
    input str13 Hospital int checkin_date double appt_time str2 day_time
    "Main Campus" 22919      -1.8934524e+12 "PM"
    "Main Campus" 22919      -1893452340000 "PM"
    "Main Campus" 22919      -1893452340000 "PM"
    "Main Campus" 22919      -1893452340000 "PM"
    "Main Campus" 22919      -1893452160000 "PM"
    "Main Campus" 22919      -1893452160000 "PM"
    "Main Campus" 22919      -1893452040000 "PM"
    "Main Campus" 22919      -1893452040000 "PM"
    "Main Campus" 22919      -1893452040000 "PM"
    "Main Campus" 22919      -1893451980000 "PM"
    "Main Campus" 22919      -1893451980000 "PM"
    "Main Campus" 22919      -1893451920000 "PM"
    "Main Campus" 22919      -1893451920000 "PM"
    "Main Campus" 22919      -1893451860000 "PM"
    "Main Campus" 22919      -1893451740000 "PM"
    "Main Campus" 22919      -1893451680000 "PM"
    "Main Campus" 22919      -1893451680000 "PM"
    "Main Campus" 22919      -1893451440000 "PM"
    "Main Campus" 22919      -1893451380000 "PM"
    "Main Campus" 22919      -1893451380000 "PM"
    "Main Campus" 22919      -1893451380000 "PM"
    "Main Campus" 22919 -1893451320000.0002 "PM"
    "Main Campus" 22919      -1.8934512e+12 "PM"
    "Main Campus" 22919      -1893450960000 "PM"
    "Main Campus" 22919      -1893450960000 "PM"
    "Main Campus" 22919      -1893450960000 "PM"
    "Main Campus" 22919      -1893450960000 "PM"

  • #2
    #1 Consider this

    Code:
    . clear
    
    . set obs 25
    Number of observations (_N) was 0, now 25.
    
    . gen date = mdy(9,30,2022) + _n
    
    . format date %td
    
    . gen week = ceil((date - mdy(9, 30, 2022))/7)
    
    . list , sep(7)
    
         +------------------+
         |      date   week |
         |------------------|
      1. | 01oct2022      1 |
      2. | 02oct2022      1 |
      3. | 03oct2022      1 |
      4. | 04oct2022      1 |
      5. | 05oct2022      1 |
      6. | 06oct2022      1 |
      7. | 07oct2022      1 |
         |------------------|
      8. | 08oct2022      2 |
      9. | 09oct2022      2 |
     10. | 10oct2022      2 |
     11. | 11oct2022      2 |
     12. | 12oct2022      2 |
     13. | 13oct2022      2 |
     14. | 14oct2022      2 |
         |------------------|
     15. | 15oct2022      3 |
     16. | 16oct2022      3 |
     17. | 17oct2022      3 |
     18. | 18oct2022      3 |
     19. | 19oct2022      3 |
     20. | 20oct2022      3 |
     21. | 21oct2022      3 |
         |------------------|
     22. | 22oct2022      4 |
     23. | 23oct2022      4 |
     24. | 24oct2022      4 |
     25. | 25oct2022      4 |
         +------------------+
    More discussion at https://www.stata-journal.com/articl...article=dm0095

    #2 I note that your date-times appear to be imported incorrectly from somewhere using MS Excel conventions.

    Code:
    . di %tc  -1893452340000
    31dec1899 01:01:00
    For more on that see https://www.stata.com/manuals/ddatet...ersoftware.pdf
    Last edited by Nick Cox; 19 Mar 2023, 06:22.

    Comment


    • #3
      Hi Scott, certainly working with date and time variables can be tough, particularly if you are more used to the formatting conventions of another program. As the Stata documentation notes, a time variable should be in a format somewhat like hhmmss (in your case it seems like the appointment times would likely have 0's in the seconds slot). Nick's second point is that the time variable is being misread during the import, and Statalisters have little recourse to see what's happening with this.

      Once you do figure out the correct import, you can split the time variable if it is a string, which will create numeric variables for the hour, minute, and second. You should be able to take it from there:
      Code:
      split appt_time, parse("(the string parser here)") limit(3) destring
      rename appt_time2 hour
      *** the next line assumes the time is presented in military time
      gen day_pm=(hour>11)

      Comment


      • #4
        Nick Cox and Eric Makela Thank you both for your responses.

        On #1: Unfortunately, the code does not appear to work. date_n below is the original date. As seen, there can be multiple observations on the same day. Since the code creates a sequential day number for each observation (date), not all observations between Oct 1-7 get labelled as week 1 (week). Is there anyway to have all observations between Oct 1-7 be labelled week 1, Oct 8-14 be week 2....

        Code:
        input float(date_n date week)
        22919 22919  1
        22919 22920  1
        22919 22921  1
        22919 22922  1
        22919 22923  1
        22919 22924  1
        22919 22925  1
        22919 22926  2
        22919 22927  2
        22919 22928  2
        22919 22929  2
        22919 22930  2
        22919 22931  2
        22919 22932  2
        22919 22933  3
        22919 22934  3
        22919 22935  3
        22919 22936  3
        22919 22937  3
        22919 22938  3
        22919 22939  3
        22919 22940  4
        22919 22941  4

        On #2: That worked perfectly Eric Makela. Thanks a lot!

        Comment


        • #5
          You're now showing two daily date variables. The relevance of the "original date" variable date_n to your question in #1 isn't clear to me. But in your data example all date_n are 1 October 2022, so all values fall in week 1.

          Otherwise I can't make sense of your doubt. I confirm that the variable you label week was created by my code and here's a plot showing that it does what I intended. If you really want something different, then sorry, but I can't follow what that is.

          Code:
          clear 
          input float(date_n date week)
          22919 22919  1
          22919 22920  1
          22919 22921  1
          22919 22922  1
          22919 22923  1
          22919 22924  1
          22919 22925  1
          22919 22926  2
          22919 22927  2
          22919 22928  2
          22919 22929  2
          22919 22930  2
          22919 22931  2
          22919 22932  2
          22919 22933  3
          22919 22934  3
          22919 22935  3
          22919 22936  3
          22919 22937  3
          22919 22938  3
          22919 22939  3
          22919 22940  4
          22919 22941  4
          end 
          
          format date* %td 
          gen njc = ceil((date - mdy(9, 30, 2022))/7) 
          assert njc == week 
          
          scatter week date , mla(week) mlabsize(medlarge) mlabpos(0) ms(none) scheme(s1color) xla(22919/22941, format(%tdd)) yla(,ang(h)) xtitle(date in October)

          Click image for larger version

Name:	weeks.png
Views:	1
Size:	15.7 KB
ID:	1706242

          Comment


          • #6
            Nick Cox Apologies, the mistake was all mine. I misread your code and created a second date variable based on your code not realizing that when you did "gen date" in your original response it was only an illustration.

            Comment


            • #7
              Indeed. I am pleased that is sorted out.

              In #1 all your sample dates are 22919 so the data example made clear what your variables are but otherwise was not suitable to demonstrate what you wanted; hence I invented another data example.

              Comment

              Working...
              X