Charlson comorbidity index

Martin Bohme

Join Date: Aug 2015

Posts: 5
#1

Charlson comorbidity index

12 Feb 2016, 02:48

Dear Statalist users

I have a question regarding the user written charlson program in Stata which I haven't been able to find answer for using the search function.

I have recordings from a database/register regarding admissions and diagnosis related to the admissions. Data are uniquely seperated by a Patient ID (social security number) and date of admission. Patients can be admitted more than once and can have different diagnosis each admission.

If I use -charlson- command it will calculate the total comorbidity index for each unique Patient ID. I would like to calculcate the specific comorbidity index for each admission.

Example:

Patient A admitted 01.01.01 diagnosed with Acute myocardial infarction.
(charlson comorbidity index at this date should be calculcated from all admissions prior to 01.01.01)
Patient A admitted 02.02.02 diagnosed with urinary tract infection.
(charlson comorbidity index at this date should be calculcated from all admissions prior to 02.02.02 and is therefore a higher value than the charlson index for admission 01.01.01)

Is there a way to calculate seperate charlson comorbidity index for each admission?

Hope someone can help me.
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4461
#2

12 Feb 2016, 06:31

is each admission a separate observation in your data? if yes, just make a new id that is a combination of the patient id and the admission - then use that new id as your id for the user-written charlson program; for other readers, please note that two different user-written programs computing the Charlson index can be found using the "search" command
1 like
Comment
Martin Bohme

Join Date: Aug 2015

Posts: 5
#3

15 Feb 2016, 04:03

Yes each admission is a seperate observation in my data. Actually each admission has several observations at the monent. Let me explain:

Obs Patient Date Diagnose
1 Patient A xx.xx.xx xx000
2 Patient A xx.xx.xx xx001
3 Patient A xx.xx.xx xx002

4 Patient A yy.yy.yy xx001
5 Patient A yy.yy.yy xx008

6 Patient B xx.xx.xx xx000
7 Patient B xx.xx.xx xx001
8 Patient B xx.xx.xx xx006

9 Patient B yy.yy.yy xx004
10 Patient B yy.yy.yy xx006

11 Patient B zz.zz.zz xx001
12 Patient B zz.zz.zz xx002

By creating a new ID, I can imagine, that sorting by Patient ID and date is worth doing. Not sure I understand how though.
If I create a new ID using Patient ID and date, how do I make sure, at the index is calculated from all previous diagnosis? The new ID won't be unique to a patient.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4461
#4

15 Feb 2016, 06:47

first, creating a new ID; assuming that Date is a string variable:

Code:

gen str newid = Patient+Date

if you want something (e.g., "/") in between patient and date, then

Code:

gen str newid = Patient+"/"+Date

if Date is numeric, insert "Date" inside the parentheses of "string()"

second, I missed a point in your original message (sorry) about wanting to base the index on the current admission and prior admissions; you don't tell us what command you are using here, but I don't believe any of the ones I know of allow anything like that - so you will have to reshape your data - which might be quite awkward here - I am not sure why you are using the Dx from prior dates as these may have been resolved
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#5

15 Feb 2016, 10:40

I share Rich Goldstein's reservations about what you are doing here. But putting that aside, it seems you need to create a data set in which sequential observations for the same patient accumulate more diagnoses over time. So I'll call your patient ID variable id, your date variable date (which must be a genuine Stata numeric date, not a string that looks like a date to human eyes), and your diagnosis variable dx. I assume that dx is a string variable, and I also assume that it never contains a missing value and that the character # never occurs in it. If the latter is not true, replace that by some other character that is guaranteed never to occur in that variable.

Code:

// TO AVOID LATER DUPLICATION OF DIAGNOSES, KEEP ONLY THE EARLIEST // OCCURRENCE OF ANY PARTICULAR DIAGNOSIS by id dx (date), sort: keep if _n == 1 // BUILD A RUNNING CONCATENATION OF ALL DIAGNOSES FOR A GIVEN PATIENT // AND THEN KEEP THE FINAL CONCATENATION FOR EACH ADMISSION by id (date dx), sort: replace dx = dx + "#" + dx[_n-1] by id date: keep if _n == _N // NOW SPLIT THE CONCATENATED DX VARIABLE INTO SEPARATE DIAGNOSES split dx, gen(diagnosis) parse("#")

Note: Untested, but I believe this will work.

Last edited by Clyde Schechter; 15 Feb 2016, 10:48. Reason: Somehow this ended up being posted without my ever hitting "Post Reply." Don't know how that happened, but had to edit in order to finish it.
Comment
Denise Vella

Join Date: Aug 2022

Posts: 187
#6

27 Sep 2022, 06:51

HI Clyde Schechter - it's been a while since you wrote this

But I wonder if you could help explain what the code below is referring to

by id (date dx), sort: replace dx = dx + "#" + dx[_n-1]
by id date: keep if _n == _N 1. What does your # refer to?
2. What is your n-1 referring to ?
3. Why do you write in (bold) to keep the row no if it is equal to total no of observations in data?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#7

27 Sep 2022, 09:15

by id (date dx), sort: replace dx = dx + "#" + dx[_n-1]
by id date: keep if _n == _N 1. What does your # refer to?

It doesn't refer to anything. It is a literal # character. The idea behind this code is that each id has many observations, and each observation contains exactly one diagnosis. The goal is to convert that to one observation per id which contains one variable holding a list of all the diagnoses. The list is built up by starting with each id's first observation and then moving down the group of observations belonging to that id, one observation at a time, adding each observation's diagnosis onto the end of the list. Now, we don't want to smash them together ending up with something like "lung canceremphysemacirrhosis." We want them to be separate words. So ordinarily, one might -replace dx = dx + " " + dx[_n-1]-. The problem with that is that the diagnoses themselves sometimes contain blanks, and later in the code it will be necessary to distinguish between spaces within a diagnosis and spaces separating diagnoses. So instead of using a " " to separate them, we have to use something else, and it must be something that never occurs within a diagnosis. I chose "#". As a result, the list forms one by one. So perhaps in the first observation the diagnosis is lung cancer and the second observation says emphysema. Then as the code reaches the second observations, dx changes to "lung cancer#emphysema." If the third observation has dx = "cirrhosis" that will then become "lung cancer#emphysema#cirrhosis". When we finally reach that id's final observation, its value of dx will be a list of all of the diagnoses, separated by # characters.

2. What is your n-1 referring to ?

First, it is not n-1. It is _n-1. And whenever you see _n-1 in Stata code it is a reference to the immediately preceding observation. There are two ways in which that can be understood. If the command in which it appears has a -by- prefix in front of it, then it means the immediately preceding observation from the same -by- group. Thus, with a -by- prefix, _n-1 will not refer to the final observation of the previous -by- group when working on the first observation of a new -by- group. If, however, there is no -by- prefix, then it just refers to whatever observation is immediately preceding in the data set, without regard to any grouping of the observations.

3. Why do you write in (bold) to keep the row no if it is equal to total no of observations in data?

That is not what that says. In the presence of a -by- prefix, _N does not refer to the last observation in the data set. It refers to the last observation in the by-group. This is, in fact, quite similar to the phenomenon explained in the preceding paragraph where _n refers not to the present observation number in the data set, but to the present observation number in the by-group, when a -by- prefix is used. So this command tells Stata to retain the final observation of each id. Based on the explanations given above, that means it is the one observation where the variable dx contains the complete list of diagnoses, separated by # characters.
Comment

Announcement

Charlson comorbidity index

Comment

Comment

Comment

Comment

Comment

Comment