Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorising ICD10 diagnosis codes

    Hi
    I have a dataset with a cohort of 2067 contacts to the emergency department.
    I have a variable called 'sidste_aktionsdiagnose_i_forløb' which contains information on icd10 diagnosis codes (see attached file). The variable is a alphanumeric string with 322 unique diagnoses.

    I wish to categorize the diagnoses to get a better overview of the diagnoses in order to see which groups of diagnoses are the most common and which diagnoses are more related to readmissions and mortality.
    I have tried to decrease the number of unique diagnoses by using the destring command:
    generate udskriv_diagkort = substr(sidste_aktionsdiagnose_i_forløb ,1,4)
    This reduces the unique diagnoses to 201, but that is still to many, and if I try to destring the ICD10 code even further, I cannot look up the meaning of the diagnosis code.

    I hope you know a way help me categorize icd10 codes?
    Kind regards
    Karoline
    Attached Files

  • #2
    Karoline:
    you may want to try:
    Code:
    egen flag=total(udskriv_diagkort)
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo,
      Thank you for quick reply.

      I tried your code but Stata (15.1) showed an error code:
      Error code 109
      type mismatch
      In an expression, you attempted to combine a string and numeric
      subexpression in a logically impossible way. For instance, you
      attempted to subtract a string from a number or you attempted
      to take the substring of a number.

      The variable 'udskriv_diagkort) has a string format.
      So I tried to use the encode code:
      encode udskriv_diagkort, generate(udskriv_diagkort2)

      And then I tried to use your code again:
      egen flag=total(udskriv_diagkort2)
      The result was a variable called flag with the same 6 digit number (220448) for all the patients.
      Did I misunderstand something?

      Kind regards,
      Karoline

      Comment


      • #4
        Karoline:
        the code I've pasted was probably mistaken. Sorry for that.
        Try, instead:
        Code:
        egen flag=count(udskriv_diagkort2)
        Perhaps the following toy-example can help:
        Code:
        . use "C:\Program Files (x86)\Stata15\ado\base\a\auto.dta"
        (1978 Automobile Data)
        
        . bysort foreign: egen flag=count(foreign)
        
        . tab foreign flag
        
                   |         flag
          Car type |        22         52 |     Total
        -----------+----------------------+----------
          Domestic |         0         52 |        52
           Foreign |        22          0 |        22
        -----------+----------------------+----------
             Total |        22         52 |        74
        Last edited by Carlo Lazzaro; 29 May 2018, 06:13. Reason: To-example added.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Carlo:
          It seems to be a tough nut to crack...
          The code is sort of working, in the sense that the code is not turning red. But I still get a variable 'Flag' with the same number 2267 all over.
          Kind regards,
          Karoline

          Comment


          • #6
            Karoline Andersen Stata has icd10 which is a suite of commands for working with ICD-10 diagnosis codes. ICD diagnostic codes of secondary diagnoses are usually grouped into comorbidities such as Charlson or Elixhauser. Stata implements the "Quan" comorbidity versions, including for ICD-10-CM. See

            Code:
            help icd10
            search charlson
            search elixhauser
            I also recommend you consider the more feature-rich R packages icd and comorbidity. There are also SAS AHRQ programs that may prefer depending on your needs. You can call R packages from Stata with your choice of user-written command: rsource, rcall or markstat. Carlo's approach is more direct which may be all what you need.

            Karoline wrote:
            I wish to categorize the [ICD-10] diagnoses to get a better overview of the diagnoses in order to see which groups of diagnoses are the most common and which diagnoses are more related to readmissions and mortality. [...]

            Comment


            • #7
              How much will you accept to lump them to getter?

              If you wants full control, then a cop of coffe and time with -replace- combined with ICD10 / icdpedia.dk will do. :-)

              If you can use the icd10 grouping, then I think this maybe can help you:

              Code:
              rename sidste_aktionsdiagnose_i_forløb icd_last /* only because it is so long and contains "ø" */
              generate icd10_short = substr(icd_last, 2, 4)
              icd10 generate icd10_description = icd10_short, description
              browse
              replace icd10_short = substr(icd_last, 2, 3) if icd10_description == "" 
              icd10 generate icd10_desc2 = icd10_short, description
              
              replace icd10_description = icd10_desc2 if icd10_description==""
              drop icd10_desc2
              I am not that use to the -icd10- command,
              perhaps you can skip the three steps and just do:

              Code:
              generate icd10_short = substr(icd_last, 2, 3 ) 
              icd10 generate icd10_description = icd10_short, description
              Perhapse better peoples than I can help there?

              Remember the prefix "D" is not at part of the ICD10 but a specific danish addition.

              If I have misunderstood you then I apologies.

              Comment


              • #8
                Anders Alexandersson: Thank you very much for your advice. Right now my internet is not working, so I will have to try your suggestions later on.
                Dennis Lund Hansen: Also thank you for your suggestion. The codes work very well. I still end up with 290 unique dianoses. I was hoping for maybe 30 or so. Maybe I will end up with another cop of coffee...

                Comment


                • #9
                  The coffee way works, but is slow (I have just done it for a few hundred different diagnoses)
                  It is a bit strange, that it can not be "compressed" more. From 2067 contacts to 290 unique diagnses sounds a bit strange. Is it only the primary diagnosis (A-diagnose/C_diagA) you are using, or are you including they other diagnosis as well?

                  The suggestion were you only use the letter and two numberes does that give just as many?

                  Code:
                  generate icd10_short = substr(icd_last, 2, 3 )  
                   icd10 generate icd10_description = icd10_short, description
                  You could of course group them basede on the letter, but that would be very crude.

                  Are you allowed to post a small part of the dataset, e.g. 50 or 100 observations from udskriv_diagkort using the -dataex- ?

                  Comment


                  • #10
                    If the data is not on the server at Statistics Denmark then you could perhaps do something like:

                    Code:
                    generate count=1
                    collapse (sum) count, by(sidste_aktionsdiagnose_i_forløb)
                    sort count
                    drop in 26/-26 /// keep the 25 least common and 25 most commen ICD10 diagnosis
                    dataex sidste_aktionsdiagnose_i_forløb count

                    If you are using data from Statistiks Denmark, then you can find, at the server, a dataset-file named something like: icd10gruppe99_c_t.dta (can't remember the exact name right now) - it contains the administrative grouping used by the health authorities, I don't know, if that could be usefull.

                    Comment

                    Working...
                    X