Hello,
I have a data file that lists changes in the type of oxygen delivery device being used by a patient in the hospital. It is in long format with each row representing a change to the oxygenation device. A sample of data for 3 patients is given below. Note that there are additional variables here that are not shown for clarity. elapsed_time is the time in hours that the patient was receiving oxygen through the specified type of device (calculated using the difference in the date_change field between the n and n+1 observation). escalation_therapy_nonvent is an indicator variable denoting if the delivery device counts as a "therapy escalation.
Note: all dates and IDs in the sample data-set have been modified for posting here
What I would like to do is collapse consecutive entries that represent an escalation of therapy (escalation_therapy_notvent == 1), which I would refer to as an "escalation episode" into a single observation with date_change == the first observation in the group being collapsed and the elapsed_time == total(elapsed_time for collapsed observations). There are additional variables not shown here and similar to date_change, I would want the entries from the first observation in an episode to be present in the collapsed observation.
I have been playing around with using various options with sortyby and referencing the _n+1th observation, but haven't been able to come up with something that will work with the variability present in my data (both in terms of the number of consecutive observations that may be present) and the number of "groups" within each patient. For example, a patient can have multiple "episodes" of escalation that are separated by normal oxygen delivery.
The purpose of this manipulation is that I want to see the distribution of typical durations of an escalation episode, but charting practices by providers and the clinical course of patients can vary considerably so some patients have 20 entries that correspond to a single "episode" but other patients have 1 or 2.
Thank you for any help that you can provide.
-JR
I have a data file that lists changes in the type of oxygen delivery device being used by a patient in the hospital. It is in long format with each row representing a change to the oxygenation device. A sample of data for 3 patients is given below. Note that there are additional variables here that are not shown for clarity. elapsed_time is the time in hours that the patient was receiving oxygen through the specified type of device (calculated using the difference in the date_change field between the n and n+1 observation). escalation_therapy_nonvent is an indicator variable denoting if the delivery device counts as a "therapy escalation.
Note: all dates and IDs in the sample data-set have been modified for posting here
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str5 registryid double date_change float elapsed_time str19 delivery_device float escalation_therapy_notvent "1" 1746208620000 .5497222 "bvm" . "1" 1.7462109e+12 26.583334 "ventilator" . "1" 1.7463066e+12 7.933055 "nc" . "1" 1746335159000 .4833333 "simple mask" . "1" 1746336899000 35.916943 "nc" . "1" 1.7464662e+12 5.816389 "room air" . "1" 1746487139000 18.733612 "nc" . "1" 1746554580000 124.94972 "ventilator" . "1" 1747004399000 2.0002778 "nc" . "1" 1.7470116e+12 309.5333 "room air" . "1" 1748125920000 1.45 "nc" . "1" 1748131140000 .016666668 "room air" . "1" 1.7481312e+12 .3330556 "non-rebreather" 1 "1" 1748132399000 2.283611 "nc" . "1" 1748140620000 1.45 "non-rebreather" 1 "1" 1748145840000 3.483333 "hfnc" 1 "1" 1748158380000 1.1666666 "non-rebreather" 1 "1" 1748162580000 14.133333 "hfnc" 1 "1" 1.7482179e+12 334.9333 "room air" . "1" 1749423660000 153.11665 "ventilator" . "1" 1749974880000 .11666667 "bvm" . "1" 1.7499753e+12 58.6 "ventilator" . "1" 1750187220000 . "ventilator" . "3" 1.7457636e+12 2.499722 "non-rebreather" 1 "3" 1745772599000 25.166945 "nc" . "3" 1.7458632e+12 .08333334 "room air" . "3" 1.7458635e+12 56.91667 "nc" . "3" 1.7460684e+12 . "room air" . "4" 1.7458956e+12 20.433056 "ventilator" . "4" 1745969159000 1.7166667 "venti mask" 1 "4" 1745975339000 29.03333 "hfnc" 1 "4" 1746079859000 .8169444 "high flow generator" 1 "4" 1.7460828e+12 17.75 "hfnc" 1 "4" 1.7461467e+12 30 "nc" . "4" 1.7462547e+12 .033333335 "room air" . "4" 1746254820000 27.116667 "nc" . "4" 1746352440000 .08305556 "room air" . "4" 1746352739000 79.51694 "nc" . "4" 1.746639e+12 .5 "room air" . "4" 1.7466408e+12 . "nc" . end format %tc date_change
I have been playing around with using various options with sortyby and referencing the _n+1th observation, but haven't been able to come up with something that will work with the variability present in my data (both in terms of the number of consecutive observations that may be present) and the number of "groups" within each patient. For example, a patient can have multiple "episodes" of escalation that are separated by normal oxygen delivery.
The purpose of this manipulation is that I want to see the distribution of typical durations of an escalation episode, but charting practices by providers and the clinical course of patients can vary considerably so some patients have 20 entries that correspond to a single "episode" but other patients have 1 or 2.
Thank you for any help that you can provide.
-JR
Comment