Creating a new variable by count of specific disease IC9 code across multiple diagnosis

Femi Olowu

Join Date: Aug 2021

Posts: 8
#1

Creating a new variable by count of specific disease IC9 code across multiple diagnosis

24 Aug 2021, 05:20

Hello,

I am working on a dataset with almost 450,000 cases with several columns of diagnostic codes(DX), labeled DX1-DX30. Each row of Dx per case represents clinical diagnostic codes per patient. A case may have up to 30 obs/visits with different or similar diagnostic codes written in ICD-9 format (International Classification of Disease codes); these codes are string variables in the form "415.11". A patient could have 30 rows of different codes representing different or the same diagnoses.

I want to be able to count the number of specific diagnoses (for instance number of code 415.11 or 415.13) per patient for all the Dx and create a new variable containing these numbers.

I am not using the right commands as the results are not similar to what I am getting on SPSS (I am very familiar with the process, using SPSS). Please I need guidance in the right direction.

Thanks.

Olowu
Tags: None
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#2

24 Aug 2021, 06:11

Please present a data example using dataex for specific code suggestions (see FAQ Advice #12 for details). It may be easier to have the data in long layout with one DX variable and a categorical variable with the description. Then

Code:

help contract

may help.

Last edited by Andrew Musau; 24 Aug 2021, 06:14.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#3

24 Aug 2021, 06:55

from your description in #1 it appears that some of the Dx codes can be duplicated; it is, however, not clear whether you want a count of the number of codes or the number of distinct codes; if the former, that is easily done using -egen- with the "rownonmissing" function (since these are string variables, be sure to use the "strok" option; see

Code:

help egen

if, however, you want the number of distinct (i.e., non-duplicated codes), then I recommend the following steps: (1) make a dataset that only includes these 30 variables and some id; (2) reshape from wide to long; (3) use the -duplicates- command to drop duplicates within an id; (4) drop observations with missing values on your Dx variable; (5) use -egen- with the "count" function to add a new variable with the count; (6) you will only need to keep one observation per id at this point (as you have you major data set, right?) so again use -duplicates- after dropping the Dx variable; (7) -merge- this data set back into your main data

note that Andrew's suggestion of contract may be faster (or may not be)
Comment

Femi Olowu

Join Date: Aug 2021
Posts: 8

24 Aug 2021, 08:00

Hi Andrew,

Apologies. I am new to this forum. A dataex sample of the data set is pasted below. I basically want to count the number of 41511 or 41513 or 41519 per row, and record the result in a new variable. Kindly assist with any command to achieve this

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str5(DX1 DX2 DX3 DX4 DX5)
"45342" "41513" "51882" "4660"  "45340"
"41519" "0389"  "99591" "5750"  "1830" 
"41519" "49390" "29680" "33829" "7245" 
"41519" "51881" "45341" "45342" "4536" 
"41519" "51882" "340"   "33829" "79311"
"41519" "51881" "45342" "5990"  "2851" 
"4162"  "51884" "486"   "262"   "V667" 
"28529" "486"   "41519" "70703" "70724"
"2309"  "41519" "5601"  "5180"  "99749"
"59381" "41519" "59080" "42820" "4254" 
end

Regards
Banjo

Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

24 Aug 2021, 08:32

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str5(DX1 DX2 DX3 DX4 DX5)
"45342" "41513" "51882" "4660"  "45340"
"41519" "0389"  "99591" "5750"  "1830"
"41519" "49390" "29680" "33829" "7245"
"41519" "51881" "45341" "45342" "4536"
"41519" "51882" "340"   "33829" "79311"
"41519" "51881" "45342" "5990"  "2851"
"4162"  "51884" "486"   "262"   "V667"
"28529" "486"   "41519" "70703" "70724"
"2309"  "41519" "5601"  "5180"  "99749"
"59381" "41519" "59080" "42820" "4254"
end

preserve
gen obs_no=_n
reshape long DX, i(obs_no) j(which)
keep if inlist(DX, "41511", "41513", "41519")
contract DX which
l, sep(0)
restore

Res.:

Code:

. l, sep(0)

     +-----------------------+
     | which      DX   _freq |
     |-----------------------|
  1. |     2   41513       1 |
  2. |     1   41519       5 |
  3. |     2   41519       2 |
  4. |     3   41519       1 |
     +-----------------------+

Here, "which" tells you which DX, e.g., which=2 is DX2, which=3 is DX3, and so on. _freq gives you the counts. If you want the counts of all 3 codes, then your contract command will be simply

Code:

contract which

and record the result in a new variable.

I do not see how this is possible within your wide dataset. You will have a variable for each DX variable. If you separate the codes, the number of variables will increase to (no. of DX variables \(\times\) no. of codes) assuming each code is present in each DX variable.

Comment

Femi Olowu

Join Date: Aug 2021

Posts: 8
#6

24 Aug 2021, 09:34

Hi Andrew, is it possible to sum the obs_no per row and gen in a new variable?
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10213

24 Aug 2021, 10:08

Like this?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str5(DX1 DX2 DX3 DX4 DX5)
"45342" "41513" "51882" "4660"  "45340"
"41519" "0389"  "99591" "5750"  "1830"
"41519" "49390" "29680" "33829" "7245"
"41519" "51881" "45341" "45342" "4536"
"41519" "51882" "340"   "33829" "79311"
"41519" "51881" "45342" "5990"  "2851"
"4162"  "51884" "486"   "262"   "V667"
"28529" "486"   "41519" "70703" "70724"
"2309"  "41519" "5601"  "5180"  "99749"
"59381" "41519" "59080" "42820" "4254"
end

gen wanted= inlist(DX1, "41511", "41513", "41519")
forval i=2/5{
    replace wanted= wanted+ inlist(DX`i', "41511", "41513", "41519")
}

Res.:

Code:

. l, sep(0)

     +------------------------------------------------+
     |   DX1     DX2     DX3     DX4     DX5   wanted |
     |------------------------------------------------|
  1. | 45342   41513   51882    4660   45340        1 |
  2. | 41519    0389   99591    5750    1830        1 |
  3. | 41519   49390   29680   33829    7245        1 |
  4. | 41519   51881   45341   45342    4536        1 |
  5. | 41519   51882     340   33829   79311        1 |
  6. | 41519   51881   45342    5990    2851        1 |
  7. |  4162   51884     486     262    V667        0 |
  8. | 28529     486   41519   70703   70724        1 |
  9. |  2309   41519    5601    5180   99749        1 |
 10. | 59381   41519   59080   42820    4254        1 |
     +------------------------------------------------+

.

Comment

Femi Olowu

Join Date: Aug 2021

Posts: 8
#8

24 Aug 2021, 11:21

Perfect Andrew. Thanks alot. I guess I didn't express my thoughts as clearly as possible
Comment

Announcement