Help with calculating rates with duplicate entries

Elle Rhaye

Join Date: Jul 2018
Posts: 2

Help with calculating rates with duplicate entries

02 Jul 2018, 19:26

Hi everyone,

I'm a STATA neophyte, but I've tried looking through the manual as well as browsing online forums to no avail so far, so I'm trying my luck here.

I have a dataset of drug prescriptions by practitioner and by health area that I need to analyse.

Specifically, I need to calculate the prescription rates (number of prescriptions per 100 people in each practitioner area, and in each health area) for drug A.

Here are the varlists:
practiceid (a unique identifier, string, which identifies each practitioner)
healtharea (a string naming a geographic area, to which multiple practitioners can belong to, but each practitioner can only belong to one area)
areapopsize (the number of patients in each practitioner's roster)
rxnumber (the number of prescriptions for a particular drug written by that practitioner)
drugname (this is pretty self-evident)
arx (this is essentially a boolean, =1 if the drugname=A, =0 if not)

So the prescription rates I need to figure out are: rate of prescription for drug A per areapopsize, and for drug A per healtharea

The nuance is that there could be multiple (non-duplicate) entries recording a practitioner to prescribing drug A (for example entries 1 and 3 below)
Also, there are multiple practitioners per health area (like 10106 and 10384 for Essex):

practiceid	healtharea	areapopsize	drugname	rxnumber	arx
10106	Essex	6132	A	12	1
10106	Essex	6132	C	13	0
10106	Essex	6132	A	9	1
10384	Essex	3589	A	15	1
10384	Essex	3589	B	20	0
10563	Kent	1204	A	15	1
10909	Lambton	948	C	3	0

I'm thinking I need to first tally up the rxnumber where arx=1 for each unique practiceid, then divide this by the areapopsize to figure out the rate per areapopsize
Then I need to combine the rxnumber where arx=1 for each healtharea, and divide this by the total popsize of the healtharea (by tallying up the constituent areapopsizes)

But I honestly don't know what STATA code to use to do this.

Any help is appreciated.

Thanks in advance!

-Elle

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30068
#2

02 Jul 2018, 20:15

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input int practiceid str7 healtharea int areapopsize str1 drugname byte(rxnumber arx) 10106 "Essex" 6132 "A" 12 1 10106 "Essex" 6132 "C" 13 0 10106 "Essex" 6132 "A" 9 1 10384 "Essex" 3589 "A" 15 1 10384 "Essex" 3589 "B" 20 0 10563 "Kent" 1204 "A" 15 1 10909 "Lambton" 948 "C" 3 0 end keep if arx == 1 collapse (sum) rxnumber (first) areapopsize, by(practiceid healtharea) gen rx_rate = rxnumber/areapopsize

Note: The above takes you literally where you describe wanting to do this just for drug A. If you want a rate for each of the drugs, the code is easily modified to do that by omitting the -keep if arx == 1- command and then adding drugname to the list of variable in the -by()- option of the -collapse- command.

Read -help collapse-; you will find it a very useful command in many contexts.

In the future, when showing data examples, please use the -dataex- command, as I have done here. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.

Last edited by Clyde Schechter; 02 Jul 2018, 20:18.
Comment
Dung Le

Join Date: May 2018

Posts: 120
#3

02 Jul 2018, 23:44

Dear Clyde Schechter,

Is there a way to keep other variables in dataset after using the collapse command? For example, how to keep drugname in the example above? In fact, I did not find answers to my question in -help collapse-. I also tried -help preverve- but it seems not working as well.

Thanks

DL
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30068
#4

03 Jul 2018, 08:32

To keep drugname in the collapsed data set, add it to the list of variables in the -by()- option, as mentioned in #2.
1 like
Comment
Elle Rhaye

Join Date: Jul 2018

Posts: 2
#5

03 Jul 2018, 16:50

Thanks Dr. Schecter. The collapse function is indeed quite useful. And thank you for letting me know about the -dataex- etiquette for posting questions!
Comment

Announcement

Help with calculating rates with duplicate entries

Comment

Comment

Comment

Comment