Categorising ICD10 diagnosis codes

Karoline Andersen

Join Date: Feb 2017

Posts: 13
#1

Categorising ICD10 diagnosis codes

28 May 2018, 07:57

Hi
I have a dataset with a cohort of 2067 contacts to the emergency department.
I have a variable called 'sidste_aktionsdiagnose_i_forløb' which contains information on icd10 diagnosis codes (see attached file). The variable is a alphanumeric string with 322 unique diagnoses.

I wish to categorize the diagnoses to get a better overview of the diagnoses in order to see which groups of diagnoses are the most common and which diagnoses are more related to readmissions and mortality.
I have tried to decrease the number of unique diagnoses by using the destring command:
generate udskriv_diagkort = substr(sidste_aktionsdiagnose_i_forløb ,1,4)
This reduces the unique diagnoses to 201, but that is still to many, and if I try to destring the ICD10 code even further, I cannot look up the meaning of the diagnosis code.

I hope you know a way help me categorize icd10 codes?
Kind regards
Karoline

Attached Files
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#2

28 May 2018, 08:04

Karoline:
you may want to try:

Code:

egen flag=total(udskriv_diagkort)

Kind regards,
Carlo
(Stata 19.0)
Comment
Karoline Andersen

Join Date: Feb 2017

Posts: 13
#3

29 May 2018, 05:34

Hi Carlo,
Thank you for quick reply.

I tried your code but Stata (15.1) showed an error code:
Error code 109
type mismatch
In an expression, you attempted to combine a string and numeric
subexpression in a logically impossible way. For instance, you
attempted to subtract a string from a number or you attempted
to take the substring of a number.

The variable 'udskriv_diagkort) has a string format.
So I tried to use the encode code:
encode udskriv_diagkort, generate(udskriv_diagkort2)

And then I tried to use your code again:
egen flag=total(udskriv_diagkort2)
The result was a variable called flag with the same 6 digit number (220448) for all the patients.
Did I misunderstand something?

Kind regards,
Karoline
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17707

29 May 2018, 05:47

Karoline:
the code I've pasted was probably mistaken. Sorry for that.
Try, instead:

Code:

egen flag=count(udskriv_diagkort2)

Perhaps the following toy-example can help:

Code:

. use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
(1978 Automobile Data)

. bysort foreign: egen flag=count(foreign)

. tab foreign flag

           |         flag
  Car type |        22         52 |     Total
-----------+----------------------+----------
  Domestic |         0         52 |        52
   Foreign |        22          0 |        22
-----------+----------------------+----------
     Total |        22         52 |        74

Last edited by Carlo Lazzaro; 29 May 2018, 06:13. Reason: To-example added.

Kind regards,
Carlo
(Stata 19.0)

Comment

Karoline Andersen

Join Date: Feb 2017

Posts: 13
#5

29 May 2018, 06:31

Carlo:
It seems to be a tough nut to crack...
The code is sort of working, in the sense that the code is not turning red. But I still get a variable 'Flag' with the same number 2267 all over.
Kind regards,
Karoline
Comment
Anders Alexandersson

Join Date: Apr 2014

Posts: 203
#6

29 May 2018, 06:54

Karoline Andersen Stata has icd10 which is a suite of commands for working with ICD-10 diagnosis codes. ICD diagnostic codes of secondary diagnoses are usually grouped into comorbidities such as Charlson or Elixhauser. Stata implements the "Quan" comorbidity versions, including for ICD-10-CM. See

Code:

help icd10 search charlson search elixhauser

I also recommend you consider the more feature-rich R packages icd and comorbidity. There are also SAS AHRQ programs that may prefer depending on your needs. You can call R packages from Stata with your choice of user-written command: rsource, rcall or markstat. Carlo's approach is more direct which may be all what you need.

Karoline wrote:

I wish to categorize the [ICD-10] diagnoses to get a better overview of the diagnoses in order to see which groups of diagnoses are the most common and which diagnoses are more related to readmissions and mortality. [...]
1 like
Comment
Dennis Lund Hansen

Join Date: Feb 2018

Posts: 45
#7

29 May 2018, 07:00

How much will you accept to lump them to getter?

If you wants full control, then a cop of coffe and time with -replace- combined with ICD10 / icdpedia.dk will do. :-)

If you can use the icd10 grouping, then I think this maybe can help you:

Code:

rename sidste_aktionsdiagnose_i_forløb icd_last /* only because it is so long and contains "ø" */ generate icd10_short = substr(icd_last, 2, 4) icd10 generate icd10_description = icd10_short, description browse replace icd10_short = substr(icd_last, 2, 3) if icd10_description == "" icd10 generate icd10_desc2 = icd10_short, description replace icd10_description = icd10_desc2 if icd10_description=="" drop icd10_desc2

I am not that use to the -icd10- command,
perhaps you can skip the three steps and just do:

Code:

generate icd10_short = substr(icd_last, 2, 3 ) icd10 generate icd10_description = icd10_short, description

Perhapse better peoples than I can help there?

Remember the prefix "D" is not at part of the ICD10 but a specific danish addition.

If I have misunderstood you then I apologies.
Comment
Karoline Andersen

Join Date: Feb 2017

Posts: 13
#8

01 Jun 2018, 02:03

Anders Alexandersson: Thank you very much for your advice. Right now my internet is not working, so I will have to try your suggestions later on.
Dennis Lund Hansen: Also thank you for your suggestion. The codes work very well. I still end up with 290 unique dianoses. I was hoping for maybe 30 or so. Maybe I will end up with another cop of coffee...
Comment
Dennis Lund Hansen

Join Date: Feb 2018

Posts: 45
#9

01 Jun 2018, 03:43

The coffee way works, but is slow (I have just done it for a few hundred different diagnoses)
It is a bit strange, that it can not be "compressed" more. From 2067 contacts to 290 unique diagnses sounds a bit strange. Is it only the primary diagnosis (A-diagnose/C_diagA) you are using, or are you including they other diagnosis as well?

The suggestion were you only use the letter and two numberes does that give just as many?

Code:

generate icd10_short = substr(icd_last, 2, 3 ) icd10 generate icd10_description = icd10_short, description

You could of course group them basede on the letter, but that would be very crude.

Are you allowed to post a small part of the dataset, e.g. 50 or 100 observations from udskriv_diagkort using the -dataex- ?
Comment
Dennis Lund Hansen

Join Date: Feb 2018

Posts: 45
#10

01 Jun 2018, 07:07

If the data is not on the server at Statistics Denmark then you could perhaps do something like:

Code:

generate count=1 collapse (sum) count, by(sidste_aktionsdiagnose_i_forløb) sort count drop in 26/-26 /// keep the 25 least common and 25 most commen ICD10 diagnosis dataex sidste_aktionsdiagnose_i_forløb count

If you are using data from Statistiks Denmark, then you can find, at the server, a dataset-file named something like: icd10gruppe99_c_t.dta (can't remember the exact name right now) - it contains the administrative grouping used by the health authorities, I don't know, if that could be usefull.
Comment

Announcement

Categorising ICD10 diagnosis codes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment