How to arrange hospitalization data for sequence analysis

Ginny Han

Join Date: Jul 2018

Posts: 22
#1

How to arrange hospitalization data for sequence analysis

23 May 2019, 06:30

Dear Statalists,
Just read a paper by Golay, P., et al., "Identifying patterns in psychiatric hospital stays with statistical methods: towards a typology of post-deinstitutionalization hospitalization trajectories". Social Psychiatry and Psychiatric Epidemiology, 2019. In the paper they claimed that "For each patient, hospitalizations were aggregated into a string of 1095 digits with digits 0 (not in hospital) or 1 (in hospital) for every day over a 3-year period. Each state sequence started with the first day at hospital and included the next 3 years of records." I am not sure how this "aggregation", or data transformation can be done in STATA.

The following is a sample hospital record data. There are 4 patients with 12 records, with each record stands for one inpatient stay. In order to do the sequence analysis on these patients hospital stay pattern for 1 year after their first admission, I need to generate a variable that have 365 digits, with each digit representing whether the patient was in hospital in each of the 365 days after the first admission (if yes, digit 1; if not, digit 0). If a patient stayed in hospital for 3 days and then came back 2 days later, the variable should be like: 1110011...

I read the Help file of the -sq- package, but I don't know how command like -sqset- can put together data with ranges, like dates. Therefore it would be great if someone can find a way to generate this "sequence" variable out of hospitalization data. Thanks very much!

Here is the sample data:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte patientid float(indate outdate) 1 19083 19092 1 19111 19119 1 19142 19146 1 19170 19174 1 19198 19214 1 19215 19228 2 19234 19248 2 19430 19474 3 19193 19194 4 19060 19075 4 19339 19357 4 19541 19558 end format %td indate format %td outdate

Regards,
Ginny Han
Tags: data transformation, sequence analysis

Clyde Schechter

Join Date: Apr 2014
Posts: 30111

23 May 2019, 10:50

This will do what you ask:

Code:

//    CALCULATES DAYS FROM DATE OF FIRST HOSPITALIZATION
by patientid (indate), sort: gen index_in = indate-indate[1] + 1
by patientid (indate): gen index_out = outdate - indate[1] + 1
//    CALCULATE INDEX TO SKIP TO AFTER A DISCHARGE
by patientid (indate): gen index_skipto = index_in[_n+1] - 1
by patientid (indate): replace index_skipto = 365*3 if _n == _N

//    CREATE THE CODED VARIABLE
gen sequence = (index_out-index_in+1)*"1" + (index_skipto - index_out)*"0"
by patientid (indate), sort: replace sequence = sequence[_n-1] + sequence if _n > 1
by patientid (indate): replace sequence = sequence[_N]

Comment

Ginny Han

Join Date: Jul 2018

Posts: 22
#3

26 May 2019, 00:55

Thanks very much, sir! That was really helpful!
Comment
Kevin Damman

Join Date: Sep 2020

Posts: 3
#4

28 Sep 2020, 14:30

Can I please follow up on this? I found this approach very useful

I am actually trying to creat a graph that show's on the Y-axis: cumulative inhospital days, and on the X-axis: time (days or years) since start of follow up.

Would it be possible to create this from this sequence data, or would another approach to (this repeated events per patient, but different failure time for each event) this problem be a better solution?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30111
#5

28 Sep 2020, 15:12

The sequence variable that the original poster raised is an idiosyncratic way of representing this kind of information. She was specifically looking to replicate that approach for undisclosed reasons.

But it is not a useful way to represent data for the purposes of working with it in Stata. In fact, if you converted your data to that form, the first thing you would need to do in order to create a graph, is transform it back to the original data (or some other usable form.)

If you want help with creating a graph of cumulative inhospital days against time since start of followup, I suggest that you

a) Repost your question in a new thread, because it isn't really related to the topic of this one

and

b) Show an example of the data you have. Be sure to use the -dataex- command to do that. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Kevin Damman

Join Date: Sep 2020

Posts: 3
#6

29 Sep 2020, 00:47

Thank you, I have made a new thread here: https://www.statalist.org/forums/for...follow-up-time
Comment

Announcement

How to arrange hospitalization data for sequence analysis

Comment

Comment

Comment

Comment

Comment