Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • To avoid double counting unique icd codes withing the same ID.

    Hi everyone

    I have a dataset which contains 25 icd-10 diagnostic codes per observation. Some people have multiple visits, thus id appears multiple times.

    I want to first count the number of unique icd codes per patient but somehow my tag double counts codes which appear more than once for the same id.
    Secondly I need to sum each icd code (condition e.g. diabetes) for the whole same. Please see below sample of data


    . dataex

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double SubjectId str7(Dx1 Dx2 Dx3 Dx4) str5 Dx5
    100 "K528" "E119" " "    " "    " "  
    101 "K228" "K20"  "J960" " "    "I251"
    101 "I251" "I10"  "E780" "E119" "N998"
    101 "J960" "J440" "E112" "I10"  "E780"
    101 "K228" "K20"  "K259" "B968" " "  
    101 "B964" "E119" "E039" " "    " "  
    101 "J440" "B956" "B964" "T814" "D649"
    101 "T857" "I251" "E119" "I10"  "E780"
    103 "G410" "F29"  "I698" "M625" " "  
    104 "C73"  "C770" "J441" " "    " "  
    105 "J154" " "    " "    " "    " "  
    105 "J188" " "    " "    "J188" " "  
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 12 out of 12 observations


    I have tried a few examples from the help file but still did not get any to work. Please advice

    gen long id = _n
    keep id Dx* SubjectId
    reshape long Dx, i(id) j(_j)
    egen diabetes = max(Dx == "E119"), by(id)


    Also tried

    gen long order = _n
    by Dx1 (order), sort: gen uniquecodes = _n==1

    list SubjectId Dx1 _n uniquecodes

    sort order
    replace uniquecodes = sum (uniquecodes)

    the second option does count uniquecodes correctly and doesn't double count within the same id. But once I use a loop to apply the same method to D1-Dx5 counts are wrong.



    Your assistance will be highly appreciated.



  • #2
    This may help:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double SubjectId str7(Dx1 Dx2 Dx3 Dx4) str5 Dx5
    100 "K528" "E119" " "    " "    " "  
    101 "K228" "K20"  "J960" " "    "I251"
    101 "I251" "I10"  "E780" "E119" "N998"
    101 "J960" "J440" "E112" "I10"  "E780"
    101 "K228" "K20"  "K259" "B968" " "  
    101 "B964" "E119" "E039" " "    " "  
    101 "J440" "B956" "B964" "T814" "D649"
    101 "T857" "I251" "E119" "I10"  "E780"
    103 "G410" "F29"  "I698" "M625" " "  
    104 "C73"  "C770" "J441" " "    " "  
    105 "J154" " "    " "    " "    " "  
    105 "J188" " "    " "    "J188" " "  
    end
    
    gen long id = _n
    reshape long Dx, i(id)
    drop if trim(Dx) == ""
    duplicates drop Subject Dx, force
    list
    tab Dx Subject
    
               |                       SubjectId
            Dx |       100        101        103        104        105 |     Total
    -----------+-------------------------------------------------------+----------
          B956 |         0          1          0          0          0 |         1
          B964 |         0          1          0          0          0 |         1
          B968 |         0          1          0          0          0 |         1
           C73 |         0          0          0          1          0 |         1
          C770 |         0          0          0          1          0 |         1
          D649 |         0          1          0          0          0 |         1
          E039 |         0          1          0          0          0 |         1
          E112 |         0          1          0          0          0 |         1
          E119 |         1          1          0          0          0 |         2
          E780 |         0          1          0          0          0 |         1
           F29 |         0          0          1          0          0 |         1
          G410 |         0          0          1          0          0 |         1
           I10 |         0          1          0          0          0 |         1
          I251 |         0          1          0          0          0 |         1
          I698 |         0          0          1          0          0 |         1
          J154 |         0          0          0          0          1 |         1
          J188 |         0          0          0          0          1 |         1
          J440 |         0          1          0          0          0 |         1
          J441 |         0          0          0          1          0 |         1
          J960 |         0          1          0          0          0 |         1
           K20 |         0          1          0          0          0 |         1
          K228 |         0          1          0          0          0 |         1
          K259 |         0          1          0          0          0 |         1
          K528 |         1          0          0          0          0 |         1
          M625 |         0          0          1          0          0 |         1
          N998 |         0          1          0          0          0 |         1
          T814 |         0          1          0          0          0 |         1
          T857 |         0          1          0          0          0 |         1
    -----------+-------------------------------------------------------+----------
         Total |         2         18          4          3          2 |        29
    tabulate would be too unwieldy for a big dataset. Consider a contract to different datasets.
    Last edited by Nick Cox; 15 May 2017, 06:37.

    Comment


    • #3
      Thank you Nick, I will try these methods and post the outcome.

      Comment

      Working...
      X