Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help needed creating & deleting observations

    Hi all,

    I have a database management problem that feels like it should be solvable, but I am just not getting it, and am hoping one of you might be able to help. I am using Stata 16.1 on Windows 10.

    My problem is a special case of dealing with duplicate data in a database that is a chronological list of hospitalizations over a period of several years. The variables of interest are Admit date/time, discharge date/time, patient ID (string), admitting service (string), and discharge diagnosis (string).

    Unfortunately, each row does not always represent a single hospitalization - there are times when a single hospitalization is actually represented by two rows (a so-called “conversion”): for the first part (#1), Service always takes on “Obs,” and for the second part (#2), Service always takes on “Medical” (however, not all instances of “Obs” or “Medical” represent conversions). The patient ID will be the same for #1 and #2 in a conversion. The database also contains repeat hospitalizations for the same patient over time; these appear as new rows later on with the same patient ID. The only way to differentiate conversions from repeat hospitalizations is that for the former, the discharge date/time of #1 is always within 5 minutes of the admission date/time of #2; readmissions never are.

    Here is an example. The first row is a single hospitalization, the second and third rows are a converted hospitalization of a different patient, and the fourth row is a repeat hospitalization of the original patient:

    Admit date/time Discharge date/time ID Service Diagnosis
    01/01/2017 16:48:49 01/03/2017 15:35:45 001 Medical J11
    02/01/2017 00:01:01 02/01/2017 12:45:45 002 Obs A10
    02/01/2017 12:45:50 02/07/2017 15:54:32 002 Medical K45
    03/10/2017 13:00:00 03/17/2017 18:45:44 001 Obs C23


    I need to generate a new observation to represent each converted hospitalization, and it needs to contain elements of both observations that describe it:

    Admit date/time from #1
    Discharge date/time from #2
    Service from #2
    Diagnosis from #2

    Finally, every time this is performed, I need to delete the two observations that created it, in order to avoid double-counting. Alternatively, if coming up with the code to delete observations is not straightforward, I could use indicator variables to avoid duplication.

    So, in sum, what I would like to do is convert the above database to this database below:


    Admit date/time Discharge date/time ID Service Diagnosis
    01/01/2017 16:48:49 01/03/2017 15:35:45 001 Medical J11
    02/01/2017 00:01:01 02/07/2017 15:54:32 002 Medical K45
    03/10/2017 13:00:00 03/17/2017 18:45:44 001 Obs C23


    Any help would be greatly appreciated – thanks to all for your insight.
    Rahul
    Last edited by Rahul Ganatra; 17 May 2021, 22:48.
Working...
X