Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Need help with ICD9 code format

    I am trying to make variables for heart disease, diabetes, etc. using ICD 9 data, but stata seems to reject the fact that there are decimals and ranges in the form I am using. Below are the codes I was given

    295-299 301 306 308-309 311-314 316 293(8 9) (295-296)(0-9) 297(1-3 8-9) 298(1-4 8-9) 299(0-1 8-9) (301 302 306)(0-9) 307(1-8) 308(0-4 9) (309 313)(0-4 8-9) 310(1) 312(1-4 8) 314(0-2 8-9) 648(4) 293.8(1-4 9) 295(0-9)(0-5) 296(0-6)(0-6) 296.8(0-2 9) 296.9(0 9) 299(0-1 8-9)(0-1) 300.0(0-2 9) 300.1(0-6 9) 300.2(0-3 9) 300.8(1-2 9) 301.1(0-3) 301.2(0-2) (301.5 307.8)(0-1 9) 301.8(1-4 9) 302.5(0-3) (302.7 307.4)(0-9) 302.8(1-5 9) 306.5(0-3 9) 307.2(0-3) 307.5(0-4 9) 309.2(1-4 8-9) 312(0-2)(0-4) 312.3(0-5 9) 312.8(1-2 9) 313.2(1-3) 313.8(1-3 9) 314.0(0-1) 648.4(0-4) E950-E959 E950(0-9) E951(0-1 8) 95(2 3) (0-1 8-9) (955 958) (0-9) 957(0-2 9) V40(2 3) V403(1 9) V409 V628.4 V673

    How do I enter these correctly? It has worked for other variables with no decimals but I need to include the decimals

  • #2
    First, convention is that many final datasets do not include the decimals. They should have been formatted as 3, 4, or 5-digit codes, no trailing zeroes.

    Second, read the manual for Stata's suite of ICD-9-related commands, which can be used to check or clean existing variables in your dataset, or to generate flags based on codes (or ranges of codes).

    Third, assuming you are inspecting one dx code variable called dx1, your code would look something like:

    Code:
    icd9 generate psychoses = dx1, range(295/298)
    You can specify more than one range as needed, e.g.

    Code:
    icd9 generate anxiety_dissociative_somatoform = dx1, range(3000/30029 3008*)
    Some of what you typed looks like it refers to some specific codes, but many can be simplified to ranges. For example, you wrote

    ... 300.0(0-2 9) ...
    I read that as 300.00, 300.01, 300.02, 300.09. If you go to a list of ICD-9 codes (e.g. this one), you would find that for this particular grouping (anxiety states), there's no code 300.03 (at least not in 2015; it may have existed in prior years). You can simply write the range as

    Code:
    range(3000*)
    Which means anything starting with "3000" (and assuming your data are stored with no decimals, which I hope is the case.

    If you need to inspect multiple dx variables and generate a global indicator for presence of some set of codes, I would probably code it this way (assuming there are 15 diagnosis code variables in your file):

    Code:
    forvalues i = 1/15 {
    icd9 generate psychoses_`i' = dx`i', range(295/298)
    egen psychoses = rowmax(psychoses_*)
    drop psychoses_*
    This may take a long time in a big dataset. Others may know how to do it faster.
    Please use the code delimiters to show code and results - use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Please use the command -dataex- to show a representative sample of data; it is installed already if you have Stata 14.2 or 15.1, else you can install it by typing

    Code:
    ssc install dataex

    Comment

    Working...
    X