Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning ICD9 (and 10) data

    Hello,

    I've been reading as much as I can about the ICD9 and 10 functions in Stata. However, I have a 13 year dataset spanning the ICD transition period and have both 9 and 10 codes in one column (long format). I've tried to clean using both the icd9 and 10 commands separately but Stata seems to get very hung up on codes that are too long or format is not perfect. Which seems odd to me as it could easily skip that dx and move on to the next one. I have also tried running both commands, creating 2 new vars and running a cross tab to see what gets missed but again, Stata skips a lot of legit looking codes simply because they appear too long or have an invalid first char. I simply want to have Stata recognize the codes and create a new var with code value and descriptive text. I'm primarily interested in only the first 3-4 characters of each code. Do I have to trim the dx codes myself first?

    Thanks,
    Ben

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float recid byte dxnum str7 dx byte icd9prob
    1  1 "S51811A" 4
    1  2 "X991XXA" 4
    1  3 "Z8249"   5
    1  4 "Y939"    5
    1  5 "Y929"    5
    1  6 "Y999"    5
    2  1 "S022XXA" 4
    2  2 "R220"    5
    2  3 "H538"    5
    2  4 "X990XXA" 4
    2  5 "Y999"    5
    2  6 "Z23"     5
    2  7 "J45909"  4
    2  8 "Z833"    5
    2  9 "Y042XXA" 4
    2 10 "Y0701"   5
    2 11 "Y92009"  4
    3  1 "S272XXA" 4
    3  2 "J9601"   5
    3  3 "Z66"     5
    end
    label values icd9prob __icd_9
    label def __icd_9 4 "Code too long", modify
    label def __icd_9 5 "Invalid 1st char (not 0-9, E, or V)", modify

  • #2
    The -icd- commands were never designed to differentiate the version of the ICD system being used, because it's highly likely the user will know what version they have. Therefore I'm not surprised that the commands complain when they run into something that does not match a valid code. The codes are systematically different, but there are overlaps that would not be noticeable by inspection (for example, both ICD-9 and -10 diagnosis codes can begin with a letter and be 4-5 digits long).

    I recommend bringing along an indicator for the version of the ICD system being used (or the time at which the code was used since the US switched over in 2015). In this way, you can split the one diagnosis variable into two and work with each more easily.

    Are you able to trace back the version number (or date) of these codes?

    Comment


    • #3
      Here's another idea:

      Code:
      clear *
      cls
      
      input str7 dx
      "S51811A"
      "X991XXA"
      "Z8249"  
      "Y939"   
      "Y929"   
      "Y999"
      "931"
      "4803"
      end
      
      icd9 check dx if !mi(dx), gen(check9)
      icd10cm check dx if !mi(dx), gen(check10)
      
      icd9 gen desc9 = dx if check9==0, description
      icd10cm gen desc10 = dx if check10==0, description
      Result

      Code:
           +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
           |      dx                                check9                      check10                   desc9                                                      desc10 |
           |----------------------------------------------------------------------------------------------------------------------------------------------------------------|
        1. | S51811A                         Code too long                 Defined code                           Laceration w/o foreign body of right forearm, init encntr |
        2. | X991XXA                         Code too long                 Defined code                                                 Assault by knife, initial encounter |
        3. |   Z8249   Invalid 1st char (not 0-9, E, or V)                 Defined code                           Family hx of ischem heart dis and oth dis of the circ sys |
        4. |    Y939   Invalid 1st char (not 0-9, E, or V)                 Defined code                                                               Activity, unspecified |
        5. |    Y929   Invalid 1st char (not 0-9, E, or V)                 Defined code                                                 Unspecified place or not applicable |
           |----------------------------------------------------------------------------------------------------------------------------------------------------------------|
        6. |    Y999   Invalid 1st char (not 0-9, E, or V)                 Defined code                                                   Unspecified external cause status |
        7. |     931               Defined code or missing   Invalid 1st char (not A-Z)     foreign body in ear                                                             |
        8. |    4803               Defined code or missing   Invalid 1st char (not A-Z)   pneumonia due to sars                                                             |
           +----------------------------------------------------------------------------------------------------------------------------------------------------------------+

      Comment


      • #4
        Thank you Leonardo Guizzetti . Unfortunately I have already tried that very approach and it still leaves a large number of seemingly valid codes untouched when I run the gen portion of the commands you suggested. I dont disagree with your comments however, I'm not sure if I can obtain any information on the coding. I will reach out to the data source however and cross my fingers

        Comment


        • #5
          Yes, I didn’t claim the idea was foolproof as I already alluded to potential overlaps.

          as for your data source, they must be able to provide this informed to you or else I’d be very skeptical of the overall quality. If you can say, where are your data from?

          Comment


          • #6
            Thank you! I have reached out to the data source to see if they can provide an ICD9/10 indicator.

            Comment

            Working...
            X