Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • ICD 10 Codes extraction

    Hello everyone,

    I am new to Stata and currently working with Stata 15.

    Please I urgently need help with generate new variable consist of ICD 10 codes related to substance use from existing ICD 10 codes variable.
    To be specific, I am working on a dataset that consist numerous ICD 10 codes as observations, however, I am only interested in ICD 10 codes related to substance use (alcohol and illicit drug use).
    My data looks like this:

    diagnosis_codep
    R10.3
    P28.83
    O82
    O82
    O99.8
    N97.9
    Z38.0
    M23.22
    T39.1
    Z36.8
    O82
    O42.0
    O99.8
    P83.8
    P83.8
    B34.9
    O80
    J21.9
    O40
    O80
    Z03.79
    T63.3
    L51.9
    P07.32
    O99.8
    O99.3
    O10.0
    O80

    Please note that each column consist up to 4million observations.The above is just to show you what it looks like.

    Of note, I have a list of all the codes I am interested in.
    Hence, I will like to extract all the ICD 10 codes related to substance use in each column (F10-F19, G31.2, K29.2 etc) under a new variable.

    I have try this code
    [CODE]
    gen new_icd10p=diagnosis_codep if strmatch(diagnosis_codep, "F1*" "G31.2*" "G62.1*" "K70*" "G72.1*" "K29.2*" "K85.2*")
    The above code came back as "
    (4,131,716 missing values generated)"

    I also tried this ode:
    [CODE]
    gen new_icd10p=diagnosis_codep if strmatch(diagnosis_codep, "F1*", "G31.2*" ,"G62.1*" ,"K70*", "G72.1*")
    The above code only take "F1*" into consideration and ignored others.

    Please can someone guide me as to how to go about this!!!
    Last edited by Helen Oni; 26 May 2019, 19:35.

  • #2
    Perhaps you can find it helpful to use Stata's built-in icd10 command for working with ICD10 codes. See the output of help icd10 for some details, and for the full writeup, follow the link at the top of the help icd10 output which will open the PDF documentation included as part of your Stata installation.

    Perhaps someone else can advise further; I'm not a user of ICD diagnosis codes but have seen the commands icd9 and icd10 recommended here frequently.

    Comment


    • #3
      I'm actually surprised you got the results you did. The -strmatch()- function accepts only two arguments, the string variable being tested for a match, and the string that you are comparing it to. By listing a bunch of alternatives the way you did, you have a syntax error, and I'm surprised Stata didn't say that. (Though I've verified on my own setup that Stata does what you have said here--this is a bug in strmatch().)

      Anyway, I think you can get what you want as follows:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str6 diagnosis_codep
      "R10.3" 
      "P28.83"
      "O82"   
      "O82"   
      "O99.8" 
      "N97.9" 
      "Z38.0" 
      "M23.22"
      "T39.1" 
      "Z36.8" 
      "O82"   
      "O42.0" 
      "O99.8" 
      "P83.8" 
      "P83.8" 
      "B34.9" 
      "O80"   
      "J21.9" 
      "O40"   
      "O80"   
      "Z03.79"
      "T63.3" 
      "L51.9" 
      "P07.32"
      "O99.8" 
      "O99.3" 
      "O10.0" 
      "O80"   
      end
      
      local to_find F1* G31.2* G62.1* K70* G72.1* K29.2* K85.2*
      gen byte matching = 0
      foreach f of local to_find {
          replace matching = 1 if strmatch(diagnosis_codep, "`f'")
      }
      In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

      Also, your data example here does not contain any observations that actually match the codes you are interested in. In the future, it would be best in a case like this to show an example that contains both some matches and some non-matches, so that the data would enable a real test of whether the code works.

      Comment


      • #4
        Originally posted by William Lisowski View Post
        Perhaps you can find it helpful to use Stata's built-in icd10 command for working with ICD10 codes. See the output of help icd10 for some details, and for the full writeup, follow the link at the top of the help icd10 output which will open the PDF documentation included as part of your Stata installation.

        Perhaps someone else can advise further; I'm not a user of ICD diagnosis codes but have seen the commands icd9 and icd10 recommended here frequently.
        Thanks William for trying to help. I've sort help using the help icd10 but couldn't get solution to the problem. I would have loved to use the Stata's built-in ICD 10 codes but I'm dealing with population data and I need compute the prevalence of substance use among this population based on the data I have.

        Comment


        • #5
          Since your interest in ICD codes is limited to finding the subset of your data that you need for computing the prevalence rate, then using basic Stata commands like strmatch will be straightforward and effective.

          For someone who is comfortable using regular expressions for string matching,
          Code:
          generate byte matching = ustrregexm(diagnosis_codep, "^(F1|G31\.2|G62\.1|K70|G72\.1|K29\.2|K85\.2)")
          will do the matching in a single command.

          For someone needing or wanting to use the icd10 command
          Code:
          icd10 generate matching3 = diagnosis_codep, range(F1* G31.2* G62.1* K70* G72.1* K29.2* K85.2*)
          will do the matching in a single command.

          Comment


          • #6
            Hi, Helen -

            I saw your private message, but it seems I can't reply to you directly. Hope you see this.

            If you want it to be more than one string match at a time, you put a | between each clause. That basically means "or" in this context; if it matches this or this, do this.

            gen new_icd10p123=diagnosis_codep if strmatch(diagnosis_codep, "F1*") | strmatch(diagnosis_codep, "G3*") | strmatch(diagnosis_codep, "G6*")

            The only problem is that there is a limit - I think you can't do more than 8 clauses at a time. So if you have more than 8 codes, you'd probably want to switch to doing a replace after the first line.

            You do your gen new, and then...

            replace new_icd10p123=diagnosis_codep if strmatch(diagnosis_codep, "F2*") | strmatch(diagnosis_codep, "F3*") [etc.]

            Comment


            • #7
              Originally posted by Shannon Campbell View Post
              Hi, Helen -

              I saw your private message, but it seems I can't reply to you directly. Hope you see this.

              If you want it to be more than one string match at a time, you put a | between each clause. That basically means "or" in this context; if it matches this or this, do this.

              gen new_icd10p123=diagnosis_codep if strmatch(diagnosis_codep, "F1*") | strmatch(diagnosis_codep, "G3*") | strmatch(diagnosis_codep, "G6*")

              The only problem is that there is a limit - I think you can't do more than 8 clauses at a time. So if you have more than 8 codes, you'd probably want to switch to doing a replace after the first line.

              You do your gen new, and then...

              replace new_icd10p123=diagnosis_codep if strmatch(diagnosis_codep, "F2*") | strmatch(diagnosis_codep, "F3*") [etc.]
              Hi Shannon,

              Thanks for the syntax, it works.

              Comment

              Working...
              X