Identifying most common combinations of values within another variable

Rob Shaw

Join Date: May 2015

Posts: 32
#1

Identifying most common combinations of values within another variable

10 Jul 2023, 02:13

I have a slightly unusual problem and I can't think how to run it in Stata or if it is possible.

I have a dataset with two variables. The first "person" contains an integer that corresponds to a specific individual. The second "drug" contains a code for a each possible medicine that the individual has been prescribed. For example

Person Drug

1 Aspirin

1 Statin

1 betablocker

2 Statin

2 betablocker

3 antidepressant

I want to try to identify which drugs tends to be prescribed together (i.e. grouped within 'person'). So in this example, two of the people have both Statin and betablocker.

My test dataset has 110000 observations with 13700 persons and 4500 different drugs but the final dataset is much larger.

Any thoughts much appreciated.
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10213

10 Jul 2023, 05:01

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte person str14 drug
1 "Aspirin"       
1 "Statin"        
1 "betablocker"   
2 "Statin"        
2 "betablocker"   
3 "antidepressant"
end

preserve
rename drug drug2
tempfile 2
save `2'
restore, preserve
joinby person using `2'
bys person: drop if drug>=drug2
contract person drug drug2, freq(freq)
contract drug drug2
gsort -_freq
list, sep(0)
*restore

Res.:

Code:

. list, sep(0)

     +-------------------------------+
     |    drug         drug2   _freq |
     |-------------------------------|
  1. |  Statin   betablocker       2 |
  2. | Aspirin        Statin       1 |
  3. | Aspirin   betablocker       1 |
     +-------------------------------+

Comment

Rob Shaw

Join Date: May 2015

Posts: 32
#3

10 Jul 2023, 07:22

Many thanks - that takes me a huge leap forward. I'm thinking of using NodeXL to try to cluster up these pairwise combinations. But is that something Stata can also do?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10213
#4

10 Jul 2023, 12:16

I am not knowledgeable in network analysis, but you can have a look at

Code:

search nwcommands

and see whether this suite of commands does what you want. Otherwise, start a new thread with an informative title and explain what you want to achieve. Those who are knowledgeable in this area may be able to help.
1 like
Comment

Announcement

Identifying most common combinations of values within another variable

Comment

Comment

Comment