Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Running count of preceding observations, not immediately prior

    Hello,

    I am trying to create a running count variable of preceding observations, not necessarily restricted to the observation immediately prior, and given certain conditions. My data include 900,000 observations.

    Code:
    clear all
    
    input str2 SITE str18(ARRIVAL DEPARTURE) byte OCCUPIED X
    "A"     "23sep2011 15:38"     "23sep2011 18:49"     1     .
    "A"     "06oct2011 18:17"     "06oct2011 23:06"     1     0
    "A"     "06oct2011 20:35"     "07oct2011 01:45"     1     1
    "A"     "15oct2011 17:19"     "15oct2011 19:27"     1     0
    "A"     "04nov2011 11:50"     "04nov2011 17:19"     1     0
    "B"     "07aug2014 22:55"     "08aug2014 17:41"     1     .
    "B"     "07aug2014 23:05"     "08aug2014 03:23"     1     1
    "B"     "07aug2014 23:55"     "08aug2014 12:25"     1     2
    "B"     "06dec2014 13:31"     "09dec2014 12:59"     0     0
    "B"     "06dec2014 15:03"     "06dec2014 16:23"     1     0
    "C"     "21mar2015 08:04"     "21mar2015 13:28"     0     .
    "C"     "21mar2015 11:21"     "21mar2015 17:27"     0     1
    end
    list
    I want a running count (X) of OCCUPIED (1 or 0), grouped by SITE (A, B, C...H), if ARRIVAL date occurs before DEPARTURE date of a preceding observation. For example, the value of X for Obs 3 is 1 because its ARRIVAL occurs before the DEPARTURE for Obs 2. Likewise, for Obs 7 and 8, X is 1 and 2, respectively, because ARRIVALS occurred before DEPARTURE for Obs 6, i.e., Obs 6 had not departed Site A when Obs 7 and 8 arrived.

    I tried the following groups of commands but (1) my ARRIVAL order gets mixed up and (2) the X values are not what they should be:
    Code:
    sort SITE ARRIVAL
    by SITE: gen long order = _n
    bysort SITE (order): gen Y = sum(OCCUPIED[_n-1]) if ARRIVAL<DEPARTURE[_n-1] & OCCUPIED==1.
    Code:
    bysort SITE: gen long order = _n
    bysort SITE (order): gen Y = sum(OCCUPIED[_n-1]) if ARRIVAL<DEPARTURE[_n-1] & OCCUPIED==1.
    Code:
    bysort SITE ARRIVAL: gen long order = _n
    bysort SITE ARRIVAL (order): gen Y = sum(OCCUPIED[_n-1]) if ARRIVAL<DEPARTURE[_n-1] & OCCUPIED==1.
    Does anyone have any suggestions?

    Thank you.
    Emily

  • #2
    An obvious question, but it needs to be asked: Before running your various code samples, you did convert your ARRIVAL and DEPARTURE to Stata Internal Format datetime values, didn't you?

    If your answer was "no" then it would benefit you to work your way through the guidance in help datetime, which is without a doubt the most visited documentation on my system, with the second-most-visited being Chapter 24 (Working with dates and times) of the Stata User's Guide PDF available from the PDF Documentation item on Stata's Help menu. Before working with dates and times, any Stata user should read the very detailed Chapter 24 thoroughly. After that, the help documentation will usually be enough to point the way. Some people may be able to remember everything without have to continually refer to the documentation, but I for one am not such a person.

    Comment


    • #3
      Yes, in my dataset they are in Stata format. I had difficulty creating/formatting properly this mock dataset, so I instead copied the values for members to see what my data look like. Fair question. I, too, refer to Chapter 24 regularly.

      This does address the ARRIVAL order problem, though. Thank you!!
      Last edited by Emily Kao; 18 Apr 2016, 18:56.

      Comment


      • #4
        To show you what appears in my real dataset (and I apologize for not having done it earlier, I did not realize how to do it before):
        Code:
             +---------------------------------------------------------------------------------+
             | SITE        ARRIVAL                      DEPARTURE       OCCUPIED   order   X   |
             |---------------------------------------------------------------------------------|
         27. | B       Thu 06oct2011 18:17   Thu 06oct2011 23:06          1          27    .   |
         28. | B       Thu 06oct2011 20:35   Fri 07oct2011 01:45          1          28    1   |
        ..........
         84. | B       Sat 01sep2012 20:27   Sun 02sep2012 04:08          1          84    .   |
         85. | B       Sun 02sep2012 02:36   Sun 02sep2012 03:13          1          85    2   |
        While X is correctly 1 for Obs 28, X should be 1 instead of 2 for Obs 85, because it is the first occurrence on September 2, 2012. The running count is based on the ARRIVAL datetime for each observation.

        Comment


        • #5
          I don't understand the problem, or perhaps there is a mistake in what you showed for the values of X.

          Look at observations 6 through 10 (site B). Date of Arrival precedes Date of Departure in observations 6, 7, 8, and 9. So it seems to me that the values of X should not be ., 1, 2, 0, 0 as you have it, but ., 1, 2, 3, and 4. What is happening with observation 9 that causes X to "reset" to 0? And given that it does reset to zero, why isn't X = 1 in observation 10?
          Last edited by Clyde Schechter; 18 Apr 2016, 19:12.

          Comment


          • #6
            X should be a running count of OCCUPIED, based on whether the ARRIVAL for the observation precedes or follows the DEPARTURE of a previous observation.
            • Obs 6 is the first observation at Site B; therefore it is missing, but can very well be coded 0.
            • Obs 7 arrived 07aug2014, which is before Obs 6 departed 08aug2014, and because Occupied is 1, X should be 1. This is patient occupancy data, so X says that at the time Patient 7 arrived, Patient 6 had not yet departed and was occupied a bed.
            • At the time Patient 8 arrived, Patients 6 and 7 were both still there occupying beds, and therefore X is 2.
            • But Patient 9 resets to 0 because all prior patients departed the hospital before Patient 9's arrival.
            • Patient 10 is 0 because, although it arrived before Patient 9 departed, Patient 9 has a 0 value for Occupied.
            I realize this one variable is complex, which I is probably why I am having trouble writing the proper Stata syntax. I am all ears to however is cleanest. I appreciate any help.

            Comment


            • #7
              I have to say I still don't understand what you're trying to do--probably it's just too late and I should be in bed. Anyway, I have a solution that almost matches what you want, but I can't get it to work for observation #10, and I don't understand your explanation about observation #10 in post #6. When patient #10 arrives, patient #9 is still there, so I don't understand why that doesn't count.

              Anyway, let me tell you what I did solve, and you can decide if its close enough to your problem that you can take it from there. I'm really relying on the general framework here rather than on your specific problem statement. You have some sites that patients are arriving at and then departing from. So the number of patients occupying the site goes up one with each arrival and then goes down one with each departure. I can calculate a variable Y that tells you how many patients are already occupying the site at the moment each patient arrives. This variable Y is equal to X in every case except observation #10 (treating missing X as 0, as you suggested), and it sounds to me like it is (more or less) what you are trying to calculate as X.

              Code:
              clear all
              
              
              input str2 SITE str18(ARRIVAL DEPARTURE) byte OCCUPIED X
              "A"     "23sep2011 15:38"     "23sep2011 18:49"     1     .
              "A"     "06oct2011 18:17"     "06oct2011 23:06"     1     0
              "A"     "06oct2011 20:35"     "07oct2011 01:45"     1     1
              "A"     "15oct2011 17:19"     "15oct2011 19:27"     1     0
              "A"     "04nov2011 11:50"     "04nov2011 17:19"     1     0
              "B"     "07aug2014 22:55"     "08aug2014 17:41"     1     .
              "B"     "07aug2014 23:05"     "08aug2014 03:23"     1     1
              "B"     "07aug2014 23:55"     "08aug2014 12:25"     1     2
              "B"     "06dec2014 13:31"     "09dec2014 12:59"     0     0
              "B"     "06dec2014 15:03"     "06dec2014 16:23"     1     0
              "C"     "21mar2015 08:04"     "21mar2015 13:28"     0     .
              "C"     "21mar2015 11:21"     "21mar2015 17:27"     0     1
              end
              
              // CREATE STATA INTERNAL DATE AND TIME VARIABLES
              foreach v of varlist ARRIVAL DEPARTURE {
                  gen double TIME_`v' = clock(`v', "DMYhm")
                  gen DATE_`v' = dofc(TIME_`v')
                  assert !missing(DATE_`v')
                  format DATE_`v' %td
                  format TIME_`v' %tc
              }
              drop ARRIVAL DEPARTURE
              
              // CREATE AN OBSERVATION NUMBER VARIABLE
              gen long obs_no = _n
              
              //    RESHAPE LONG TO CREATE A SERIES OF ARRIVALS AND 
              //    DEPARTURES IN EACH SITE
              reshape long DATE_ TIME_, i(obs_no) j(event) string
              
              //    CODE ARRIVALS AS +1, DEPARTURES AS -1
              gen n_event = cond(event == "ARRIVAL", 1, -1)
              
              //  CALCULATE RUNNING TOTAL OCCUPANCY AT EACH TIME
              by SITE (TIME_), sort: gen occupancy = sum(n_event)
              
              //    TOTAL OCCUPANCY AT MOMENT PATIENT ARRIVES IS THIS
              //    OCCUPANCY MINUS 1 (TO EXCLUDE PATIENT'S OWN ARRIVAL)
              by obs_no (event), sort: gen Y = occupancy[1] - 1
              
              //    REPLACE MISSING X BY 0 TO FACILITATE COMPARISON TO Y
              replace X = 0 if missing(X)
              
              //    GO BACK TO WIDE LAYOUT
              drop n_event occupancy
              reshape wide
              
              //    RESTORE ORIGINAL ORDER OF OBSERVATIONS
              sort obs_no
              order Y, after(X)
              list, noobs clean
              
              assert Y == X if obs_no != 10
              I hope this helps. It isn't quite what you said you wanted, but it's close, and I can't seem to wrap my mind around what you want. My hope is that this will serve as a starting point for you.

              Comment


              • #8
                Thank you, Clyde!! This was incredibly helpful. Your understanding was spot on. I just needed to make a minor adjustment to account for patients who did not occupy a bed, and therefore should not be counted in my Y. To do this, I added an intermediary variable following your line of code.
                Code:
                gen n_event = cond(event == "ARRIVAL", 1, -1)
                //TO ACCOUNT FOR PATIENTS WHO DID NOT OCCUPY BEDS//
                gen eventclean = OCCUPIED*n_event
                by SITE (TIME_), sort: gen occupancy = sum(eventclean)
                This produced the values I wanted. Here are the results.
                Code:
                . list
                
                     +---------------------------------------------------------------------------------------------------------------+
                     | obs_no       event   SITE                TIME_       DATE_   n_event   OCCUPIED   eventc~n   occupa~y   X   Y |
                     |---------------------------------------------------------------------------------------------------------------|
                  1. |      1     ARRIVAL      A   23sep2011 15:38:00   23sep2011         1          1          1          1   0   0 |
                  2. |      1   DEPARTURE      A   23sep2011 18:49:00   23sep2011        -1          1         -1          0   0   0 |
                  3. |      2     ARRIVAL      A   06oct2011 18:17:00   06oct2011         1          1          1          1   0   0 |
                  4. |      2   DEPARTURE      A   06oct2011 23:06:00   06oct2011        -1          1         -1          1   0   0 |
                  5. |      3     ARRIVAL      A   06oct2011 20:35:00   06oct2011         1          1          1          2   1   1 |
                     |---------------------------------------------------------------------------------------------------------------|
                  6. |      3   DEPARTURE      A   07oct2011 01:45:00   07oct2011        -1          1         -1          0   1   1 |
                  7. |      4     ARRIVAL      A   15oct2011 17:19:00   15oct2011         1          1          1          1   0   0 |
                  8. |      4   DEPARTURE      A   15oct2011 19:27:00   15oct2011        -1          1         -1          0   0   0 |
                  9. |      5     ARRIVAL      A   04nov2011 11:50:00   04nov2011         1          1          1          1   0   0 |
                 10. |      5   DEPARTURE      A   04nov2011 17:19:00   04nov2011        -1          1         -1          0   0   0 |
                     |---------------------------------------------------------------------------------------------------------------|
                 11. |      6     ARRIVAL      B   07aug2014 22:55:00   07aug2014         1          1          1          1   0   0 |
                 12. |      6   DEPARTURE      B   08aug2014 17:41:00   08aug2014        -1          1         -1          0   0   0 |
                 13. |      7     ARRIVAL      B   07aug2014 23:05:00   07aug2014         1          1          1          2   1   1 |
                 14. |      7   DEPARTURE      B   08aug2014 03:23:00   08aug2014        -1          1         -1          2   1   1 |
                 15. |      8     ARRIVAL      B   07aug2014 23:55:00   07aug2014         1          1          1          3   2   2 |
                     |---------------------------------------------------------------------------------------------------------------|
                 16. |      8   DEPARTURE      B   08aug2014 12:25:00   08aug2014        -1          1         -1          1   2   2 |
                 17. |      9     ARRIVAL      B   06dec2014 13:31:00   06dec2014         1          0          0          0   0   0 |
                 18. |      9   DEPARTURE      B   09dec2014 12:59:00   09dec2014        -1          0          0          0   0   0 |
                 19. |     10     ARRIVAL      B   06dec2014 15:03:00   06dec2014         1          1          1          1   0   0 |
                 20. |     10   DEPARTURE      B   06dec2014 16:23:00   06dec2014        -1          1         -1          0   0   0 |
                     |---------------------------------------------------------------------------------------------------------------|
                 21. |     11     ARRIVAL      C   21mar2015 08:04:00   21mar2015         1          0          0          0   0   0 |
                 22. |     11   DEPARTURE      C   21mar2015 13:28:00   21mar2015        -1          0          0          0   0   0 |
                 23. |     12     ARRIVAL      C   21mar2015 11:21:00   21mar2015         1          0          0          0   0   0 |
                 24. |     12   DEPARTURE      C   21mar2015 17:27:00   21mar2015        -1          0          0          0   0   0 |
                     +---------------------------------------------------------------------------------------------------------------+

                Comment


                • #9
                  Glad I was able to help. And now that I've seen the final code, I finally understand the original situation. Thanks for the closure.

                  Comment

                  Working...
                  X