
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorising ICD10 diagnosis codes

    I have a dataset with a cohort of 2067 contacts to the emergency department.
    I have a variable called 'sidste_aktionsdiagnose_i_forløb' which contains information on icd10 diagnosis codes (see attached file). The variable is a alphanumeric string with 322 unique diagnoses.

    I wish to categorize the diagnoses to get a better overview of the diagnoses in order to see which groups of diagnoses are the most common and which diagnoses are more related to readmissions and mortality.
    I have tried to decrease the number of unique diagnoses by using the destring command:
    generate udskriv_diagkort = substr(sidste_aktionsdiagnose_i_forløb ,1,4)
    This reduces the unique diagnoses to 201, but that is still to many, and if I try to destring the ICD10 code even further, I cannot look up the meaning of the diagnosis code.

    I hope you know a way help me categorize icd10 codes?
    Kind regards
    Attached Files

  • #2
    you may want to try:
    egen flag=total(udskriv_diagkort)
    Kind regards,
    (StataNow 18.5)


    • #3
      Hi Carlo,
      Thank you for quick reply.

      I tried your code but Stata (15.1) showed an error code:
      Error code 109
      type mismatch
      In an expression, you attempted to combine a string and numeric
      subexpression in a logically impossible way. For instance, you
      attempted to subtract a string from a number or you attempted
      to take the substring of a number.

      The variable 'udskriv_diagkort) has a string format.
      So I tried to use the encode code:
      encode udskriv_diagkort, generate(udskriv_diagkort2)

      And then I tried to use your code again:
      egen flag=total(udskriv_diagkort2)
      The result was a variable called flag with the same 6 digit number (220448) for all the patients.
      Did I misunderstand something?

      Kind regards,


      • #4
        the code I've pasted was probably mistaken. Sorry for that.
        Try, instead:
        egen flag=count(udskriv_diagkort2)
        Perhaps the following toy-example can help:
        . use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
        (1978 Automobile Data)
        . bysort foreign: egen flag=count(foreign)
        . tab foreign flag
                   |         flag
          Car type |        22         52 |     Total
          Domestic |         0         52 |        52
           Foreign |        22          0 |        22
             Total |        22         52 |        74
        Last edited by Carlo Lazzaro; 29 May 2018, 06:13. Reason: To-example added.
        Kind regards,
        (StataNow 18.5)


        • #5
          It seems to be a tough nut to crack...
          The code is sort of working, in the sense that the code is not turning red. But I still get a variable 'Flag' with the same number 2267 all over.
          Kind regards,


          • #6
            Karoline Andersen Stata has icd10 which is a suite of commands for working with ICD-10 diagnosis codes. ICD diagnostic codes of secondary diagnoses are usually grouped into comorbidities such as Charlson or Elixhauser. Stata implements the "Quan" comorbidity versions, including for ICD-10-CM. See

            help icd10
            search charlson
            search elixhauser
            I also recommend you consider the more feature-rich R packages icd and comorbidity. There are also SAS AHRQ programs that may prefer depending on your needs. You can call R packages from Stata with your choice of user-written command: rsource, rcall or markstat. Carlo's approach is more direct which may be all what you need.

            Karoline wrote:
            I wish to categorize the [ICD-10] diagnoses to get a better overview of the diagnoses in order to see which groups of diagnoses are the most common and which diagnoses are more related to readmissions and mortality. [...]


            • #7
              How much will you accept to lump them to getter?

              If you wants full control, then a cop of coffe and time with -replace- combined with ICD10 / will do. :-)

              If you can use the icd10 grouping, then I think this maybe can help you:

              rename sidste_aktionsdiagnose_i_forløb icd_last /* only because it is so long and contains "ø" */
              generate icd10_short = substr(icd_last, 2, 4)
              icd10 generate icd10_description = icd10_short, description
              replace icd10_short = substr(icd_last, 2, 3) if icd10_description == "" 
              icd10 generate icd10_desc2 = icd10_short, description
              replace icd10_description = icd10_desc2 if icd10_description==""
              drop icd10_desc2
              I am not that use to the -icd10- command,
              perhaps you can skip the three steps and just do:

              generate icd10_short = substr(icd_last, 2, 3 ) 
              icd10 generate icd10_description = icd10_short, description
              Perhapse better peoples than I can help there?

              Remember the prefix "D" is not at part of the ICD10 but a specific danish addition.

              If I have misunderstood you then I apologies.


              • #8
                Anders Alexandersson: Thank you very much for your advice. Right now my internet is not working, so I will have to try your suggestions later on.
                Dennis Lund Hansen: Also thank you for your suggestion. The codes work very well. I still end up with 290 unique dianoses. I was hoping for maybe 30 or so. Maybe I will end up with another cop of coffee...


                • #9
                  The coffee way works, but is slow (I have just done it for a few hundred different diagnoses)
                  It is a bit strange, that it can not be "compressed" more. From 2067 contacts to 290 unique diagnses sounds a bit strange. Is it only the primary diagnosis (A-diagnose/C_diagA) you are using, or are you including they other diagnosis as well?

                  The suggestion were you only use the letter and two numberes does that give just as many?

                  generate icd10_short = substr(icd_last, 2, 3 )  
                   icd10 generate icd10_description = icd10_short, description
                  You could of course group them basede on the letter, but that would be very crude.

                  Are you allowed to post a small part of the dataset, e.g. 50 or 100 observations from udskriv_diagkort using the -dataex- ?


                  • #10
                    If the data is not on the server at Statistics Denmark then you could perhaps do something like:

                    generate count=1
                    collapse (sum) count, by(sidste_aktionsdiagnose_i_forløb)
                    sort count
                    drop in 26/-26 /// keep the 25 least common and 25 most commen ICD10 diagnosis
                    dataex sidste_aktionsdiagnose_i_forløb count

                    If you are using data from Statistiks Denmark, then you can find, at the server, a dataset-file named something like: icd10gruppe99_c_t.dta (can't remember the exact name right now) - it contains the administrative grouping used by the health authorities, I don't know, if that could be usefull.

