To avoid double counting unique icd codes withing the same ID.

Maleshwane Pule

Join Date: May 2017

Posts: 5
#1

To avoid double counting unique icd codes withing the same ID.

15 May 2017, 05:59

Hi everyone

I have a dataset which contains 25 icd-10 diagnostic codes per observation. Some people have multiple visits, thus id appears multiple times.

I want to first count the number of unique icd codes per patient but somehow my tag double counts codes which appear more than once for the same id.
Secondly I need to sum each icd code (condition e.g. diabetes) for the whole same. Please see below sample of data

. dataex

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input double SubjectId str7(Dx1 Dx2 Dx3 Dx4) str5 Dx5 100 "K528" "E119" " " " " " " 101 "K228" "K20" "J960" " " "I251" 101 "I251" "I10" "E780" "E119" "N998" 101 "J960" "J440" "E112" "I10" "E780" 101 "K228" "K20" "K259" "B968" " " 101 "B964" "E119" "E039" " " " " 101 "J440" "B956" "B964" "T814" "D649" 101 "T857" "I251" "E119" "I10" "E780" 103 "G410" "F29" "I698" "M625" " " 104 "C73" "C770" "J441" " " " " 105 "J154" " " " " " " " " 105 "J188" " " " " "J188" " " end

------------------ copy up to and including the previous line ------------------

Listed 12 out of 12 observations

I have tried a few examples from the help file but still did not get any to work. Please advice

gen long id = _n
keep id Dx* SubjectId
reshape long Dx, i(id) j(_j)
egen diabetes = max(Dx == "E119"), by(id)

Also tried

gen long order = _n
by Dx1 (order), sort: gen uniquecodes = _n==1

list SubjectId Dx1 _n uniquecodes

sort order
replace uniquecodes = sum (uniquecodes)

the second option does count uniquecodes correctly and doesn't double count within the same id. But once I use a loop to apply the same method to D1-Dx5 counts are wrong.

Your assistance will be highly appreciated.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35696

15 May 2017, 06:34

This may help:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input double SubjectId str7(Dx1 Dx2 Dx3 Dx4) str5 Dx5
100 "K528" "E119" " "    " "    " "  
101 "K228" "K20"  "J960" " "    "I251"
101 "I251" "I10"  "E780" "E119" "N998"
101 "J960" "J440" "E112" "I10"  "E780"
101 "K228" "K20"  "K259" "B968" " "  
101 "B964" "E119" "E039" " "    " "  
101 "J440" "B956" "B964" "T814" "D649"
101 "T857" "I251" "E119" "I10"  "E780"
103 "G410" "F29"  "I698" "M625" " "  
104 "C73"  "C770" "J441" " "    " "  
105 "J154" " "    " "    " "    " "  
105 "J188" " "    " "    "J188" " "  
end

gen long id = _n
reshape long Dx, i(id)
drop if trim(Dx) == ""
duplicates drop Subject Dx, force
list
tab Dx Subject

           |                       SubjectId
        Dx |       100        101        103        104        105 |     Total
-----------+-------------------------------------------------------+----------
      B956 |         0          1          0          0          0 |         1
      B964 |         0          1          0          0          0 |         1
      B968 |         0          1          0          0          0 |         1
       C73 |         0          0          0          1          0 |         1
      C770 |         0          0          0          1          0 |         1
      D649 |         0          1          0          0          0 |         1
      E039 |         0          1          0          0          0 |         1
      E112 |         0          1          0          0          0 |         1
      E119 |         1          1          0          0          0 |         2
      E780 |         0          1          0          0          0 |         1
       F29 |         0          0          1          0          0 |         1
      G410 |         0          0          1          0          0 |         1
       I10 |         0          1          0          0          0 |         1
      I251 |         0          1          0          0          0 |         1
      I698 |         0          0          1          0          0 |         1
      J154 |         0          0          0          0          1 |         1
      J188 |         0          0          0          0          1 |         1
      J440 |         0          1          0          0          0 |         1
      J441 |         0          0          0          1          0 |         1
      J960 |         0          1          0          0          0 |         1
       K20 |         0          1          0          0          0 |         1
      K228 |         0          1          0          0          0 |         1
      K259 |         0          1          0          0          0 |         1
      K528 |         1          0          0          0          0 |         1
      M625 |         0          0          1          0          0 |         1
      N998 |         0          1          0          0          0 |         1
      T814 |         0          1          0          0          0 |         1
      T857 |         0          1          0          0          0 |         1
-----------+-------------------------------------------------------+----------
     Total |         2         18          4          3          2 |        29

tabulate would be too unwieldy for a big dataset. Consider a contract to different datasets.

Last edited by Nick Cox; 15 May 2017, 06:37.

Comment

Maleshwane Pule

Join Date: May 2017

Posts: 5
#3

15 May 2017, 07:58

Thank you Nick, I will try these methods and post the outcome.
Comment

Announcement

To avoid double counting unique icd codes withing the same ID.

Comment

Comment