Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new string variable for a specific group of ICD-9 Codes

    I am combining dx1-dx15 diagnosis codes to create one big diagnosis code called Tertiary, using the following code in stata:

    Code:
    egen Tertiary = concat(dx1 dx2 dx3 dx4 dx5 dx6 dx7 dx8 dx9 dx10 dx11 dx12 dx13 dx14 dx15), p(" ")
    I wanted to search the new diagnosis code, Tertiary, for specific group of icd 9 codes (all at once): 73395, 73396, 73393, 73394. I want to create a new variable for those individuals in the column "Tertiary" with the icd9 code of 73395, 73396, 73393, 73394 so I can analyze the data later for the age of these patients and ect.

    I have tried the code,
    Code:
    icd9 generate march = Tertiary, range(733*)
    I received an error:
    "Primary contains invalid ICD-9 codes
    r(459);"

    How can I search the Tertiary variable for the icd9 codes 73395, 73396, 73393, 73394 and generate a new variable from the Tertiary variable with only these 73395, 73396, 73393, 73394?

  • #2
    Intuitively, I don’t think that concatenating everything is an optimal way. I will explain more after some thought and when I am not on my iPhone. The way I would handle this is to write a loop using foreach or while loop to search each Dx code field for the codes of interest.

    Code:
    gen stress_fx = 0
    foreach dx in dx1-dx15 {
    replace stress_fx = 1 if inlist(`dx’,”73393”,”73394”,...)
    }
    not tested for typos and I am on my iPhone, so caveat lector
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      Thank you. I tried the code, but got an error:

      ”73394” invalid name
      r(198);

      Do you know how I can get around this?

      Thanks.

      Comment


      • #4
        Originally posted by Amber Washington View Post
        Thank you. I tried the code, but got an error:

        ”73394” invalid name
        r(198);

        Do you know how I can get around this?

        Thanks.
        So, I should have used the icd9 command:

        Code:
        foreach dx in dx1-dx15 {
        icd9 generate byte stress_fx_`dx' = `dx', range(73393/79396)
        }
        I think concatenating all the ICD9 codes isn't optimal because it's clunky to search a long string, especially if you are searching for something in a range (like above), or if you want to search for a code starting with a certain set of digits. For sure the icd9 command won't work on a long concatenated dx code. I am pretty sure it's not the most efficient as the entire concatenated variable has to accommodate 15 variables, which have presumably got a length of 5, but few people will have all 15 Dx code fields filled out. (For those who are interested but who don't know, healthcare claims files often have 10 to 15 fields for diagnosis codes; the first one is the primary diagnosis, and every valid claim should have one, but there are all those other fields in case they are relevant for documentation or billing. Few patients are going to have 15 filled out. Of course, it's certainly possible.)

        I know it also seems clunky to generate 15 dummy variables, but a) generating them as a byte minimizes the storage used, and b) you can just take the maximum of the dummy variables and drop them thereafter:

        Code:
        egen byte stress_fx = rowmax(stress_fx_*)
        drop stress_fx_*
        Right now, I'm exactly not sure why the code I gave you earlier was returning that error, but it could be that your Dx codes are coded as numeric, instead of text. I am used to them being in text, so as to accommodate procedure, HCPCS, or ICD-10 codes, but I could be wrong! I know for sure that -inlist- can take string arguments. If they were numeric, you could just drop the double quotes. You could also have used -inrange-, and the syntax should be obvious. Or you can stick to the stock icd9 command, but I sort of suspect the machinery behind the scenes is identical or near identical.

        If this still doesn't work, can you use the -dataex- command to post a sample of your data? It's already installed if you have Stata 15.1. Otherwise, you have to manually install it. If you have Stata 15.1, skip the first line below.

        Code:
        ssc install dataex
        dataex dx*, count(20)
        Last edited by Weiwen Ng; 02 Apr 2018, 09:15.
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5

          Originally posted by Weiwen Ng View Post

          So, I should have used the icd9 command:

          Code:
          foreach dx in dx1-dx15 {
          icd9 generate byte stress_fx_`dx' = `dx', range(73393/79396)
          }
          I think concatenating all the ICD9 codes isn't optimal because it's clunky to search a long string, especially if you are searching for something in a range (like above), or if you want to search for a code starting with a certain set of digits. For sure the icd9 command won't work on a long concatenated dx code. I am pretty sure it's not the most efficient as the entire concatenated variable has to accommodate 15 variables, which have presumably got a length of 5, but few people will have all 15 Dx code fields filled out. (For those who are interested but who don't know, healthcare claims files often have 10 to 15 fields for diagnosis codes; the first one is the primary diagnosis, and every valid claim should have one, but there are all those other fields in case they are relevant for documentation or billing. Few patients are going to have 15 filled out. Of course, it's certainly possible.)

          I know it also seems clunky to generate 15 dummy variables, but a) generating them as a byte minimizes the storage used, and b) you can just take the maximum of the dummy variables and drop them thereafter:

          Code:
          egen byte stress_fx = rowmax(stress_fx_*)
          drop stress_fx_*
          Right now, I'm exactly not sure why the code I gave you earlier was returning that error, but it could be that your Dx codes are coded as numeric, instead of text. I am used to them being in text, so as to accommodate procedure, HCPCS, or ICD-10 codes, but I could be wrong! I know for sure that -inlist- can take string arguments. If they were numeric, you could just drop the double quotes. You could also have used -inrange-, and the syntax should be obvious. Or you can stick to the stock icd9 command, but I sort of suspect the machinery behind the scenes is identical or near identical.

          If this still doesn't work, can you use the -dataex- command to post a sample of your data? It's already installed if you have Stata 15.1. Otherwise, you have to manually install it. If you have Stata 15.1, skip the first line below.

          Code:
          ssc install dataex
          dataex dx*, count(20)

          Thank you for your help. I am a stata novice. I understand now that concatenating all the ICD9 codes isn't optimal. My ultimate goal is to create a variable that encompasses the specific idc9 codes for dx1-dx15. I tried to use the:

          Code:
          foreach dx in dx1-dx15 {
          icd9 generate byte stress_fx_`dx' = `dx', range(73393/79396)
          }
          I received the error

          Code:
          invalid syntax
          r(198);
          I this is my sample data, I hope this can troubleshoot the problem.

          Code:
          input str5(dx1 dx2 dx3 dx4 dx5 dx6 dx7 dx8 dx9    dx10 dx11 dx12 dx13 dx14 dx15) int(dxccs1    dxccs2 dxccs3 dxccs4 dxccs5    dxccs6 dxccs7    dxccs8    dxccs9    dxccs10    dxccs11    dxccs12    dxccs13    dxccs14    dxccs15)
          "78079" "4019"  "3051"  "V1552" "V5869" ""    ""      ""      ""      "" "" "" "" "" ""    252  98 663 233 257   .   .    .   . . . .    . . .
          "55010" "3051"  ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    143 663   .   .   .   .   .    .   . . . .    . . .
          "6827"  "25000" ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    197  49   .   .   .   .   .    .   . . . .    . . .
          "53550" "30000" "V1261" "496"   "2720"  "3051"    ""      ""      ""      "" "" "" "" "" ""    140 651 133 127  53 663   .    .   . . . .    . . .
          "78060" "49390" ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    246 128   .   .   .   .   .    .   . . . .    . . .
          "2989"  "30000" ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    659 651   .   .   .   .   .    .   . . . .    . . .
          "78079" "3051"  ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    252 663   .   .   .   .   .    .   . . . .    . . .
          "5259"  "5289"  ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    136 137   .   .   .   .   .    .   . . . .    . . .
          "4739"  "3051"  ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    126 663   .   .   .   .   .    .   . . . .    . . .
          "29690" "30391" ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    657 660   .   .   .   .   .    .   . . . .    . . .
          "78841" ""      ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    163   .   .   .   .   .   .    .   . . . .    . . .
          "94524" "V1301" ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    240 160   .   .   .   .   .    .   . . . .    . . .
          "71831" ""      ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    225   .   .   .   .   .   .    .   . . . .    . . .
          "84500" ""      ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    232   .   .   .   .   .   .    .   . . . .    . . .
          "83209" ""      ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    225   .   .   .   .   .   .    .   . . . .    . . .
          "V714"  "78652" ""      ""      ""      ""    ""      ""      ""      "" "" "" "" "" ""    244 133   .   .   .   .   .    .   . . . .    . . .
          "5770"  "30391" "49390" "V442"  "7291"  "5859"    "2449"  "32723" ""      "" "" "" "" "" ""    152 660 128 155 211 158  48    259   . . . .    . . .
          "29690" "56211" "V6284" "V652"  "V1005" "3051"    "V4582" "V6542" "41401" "" "" "" "" "" ""    657 146 662 255  14 663 101    661 101 . . .    . . .
          "78701" "7840"  "4019"  "49390" "V1087" ""    ""      ""      ""      "" "" "" "" "" ""    250  84  98 128  36   .   .    .   . . . .    . . .
          "7851"  "2768"  "490"   "49390" ""      ""    ""      ""      ""      "" "" "" "" "" ""    106  55 127 128   .   .   .    .   . . . .    . . .

          Comment


          • #6
            So, don't forget to put an "end" at the end of the data input process!

            I either messed up the foreach syntax, or icd9 doesn't allow you to specify variable type (e.g. a floating point number, a byte, or something else). But successfully tested this code:

            Code:
            forval v = 1/15 {
            icd9 generate byte stressfx`v' = dx`v', range(73393/73396)
            }
            It will break with a "no observations" error after having generated 9 new dummy variables, but dx10 and subsequent have all missing observations. I am not sure why it does that; this is not the behavior I'd expect. Maybe a better way to do it is:

            Code:
            forval v = 1/15 {
            gen byte stressfx`v' = inrange(dx`v',"73393","73396")
            egen byte stressfx = rowmax(stressfx*)
            drop stressfx1-stressfx15
            }
            -inrange- checks if a variable is in a range, and yes, it works with text variables. Just to explain, if it's not obvious: the third line creates a dummy variable showing the maximum value of all the variables named stressfx (the * is a wildcard character). The fourth line drops all the 15 dummies (but the wildcard character can mean any number of characters including 0, so don't use it in that line!).
            Last edited by Weiwen Ng; 02 Apr 2018, 15:09.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment

            Working...
            X