counting number of observations with conditions

Anna Robert

Join Date: Jun 2023

Posts: 4
#1

counting number of observations with conditions

01 Jul 2023, 07:19

Hi,

I have data that list presecriptions that have been filled from variouus pharmacies. It has the following variables:
- Patient id
- Medication name
- Pharmacy id
- Presecription Date

I want to:
1. counting number of patients who got the same "medication name" from more than one pharmacy.
2. if possible, how to limit number 1 to those who got the same medication (from more than one pharmacy) within one year i.e. the time difference between the 2 prescriptions is 1 year or less.

Is this possible?

Thanks!!
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17734
#2

01 Jul 2023, 08:06

Anna:
welcome to this forum.
1) see, as a first step to take, the -group- function available from -egen-;
2) please share an example/excerpt of your dataset via -dataex- (as per FAQ). Thanks.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

Anna Robert

Join Date: Jun 2023
Posts: 4

01 Jul 2023, 11:14

Thanks Carlo.

I will read about point 1 (the -group- function).

Here is an example of the dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int patient_id str11 medication_name str3 pharmacy_id str10 date
123 "brufen"      "ABC" "2021-03-13"
456 "brufen"      "XYZ" "2021-04-22"
789 "paracetamol" "XYZ" "2020-01-06"
123 "paracetamol" "ABC" "2019-03-15"
456 "brufen"      "ABC" "2022-01-11"
end

For example, Patient 456 get the same medication from Pharmacy XYZ and Pharmacy ABC.

Thanks.

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30165

01 Jul 2023, 11:44

Most of the work here is preparing the data. First, it is important to verify that you do not have any completely duplicate observations in the data set: the code below checks this first. Next, you need a real Stata date variable--you can't do calculations like within 1 year using a string variable. (Well, you can, but you wouldn't want to put yourself through the pain that involves.) Finally, you need a variable that uniquely identifies prescriptions. After that, -rangestat-, written by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC, does the work.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int patient_id str11 medication_name str3 pharmacy_id str10 date
123 "brufen"      "ABC" "2021-03-13"
456 "brufen"      "XYZ" "2021-04-22"
789 "paracetamol" "XYZ" "2020-01-06"
123 "paracetamol" "ABC" "2019-03-15"
456 "brufen"      "ABC" "2022-01-11"
end

//    VERIFY NO DUPLICATE OBSERVATIONS IN DATA
duplicates report
assert r(unique_value) == r(N)

//    CREATE A REAL STATA DATE VARIABLE
gen _date = daily(date, "YMD"), after(date)
assert missing(date) == missing(_date)
format _date %td
drop date
rename _date date

//    CREATE A VARIABLE THAT IDENTIFIES OBSERVATIONS
gen `c(obs_t)' rx_num = _n

//    COUNT UP REPEAT PRESCRIPTION OF SAME DRUG BY SAME PATIENT WITHIN 1 YEAR
rangestat (count) repeats = rx_num, by(patient_id medication_name) ///
    interval(date 0 365)
//    COUNT UP REPEATE PRESCRIPTIONS OF SAME DRUG BY SAME PATIENT AT SAME PHARMACY
rangestat (count) same_pharmacy_repeats = rx_num, ///
    by(patient_id medication_name pharmacy_id) interval(date 0 365)
    
gen int wanted = repeats - same_pharmacy_repeats

Comment

Anna Robert

Join Date: Jun 2023

Posts: 4
#5

02 Jul 2023, 09:58

Thanks Clyde.

This code was helpful. The only thing is that I noticed it will count non-unique values i.e.:
if a patient got a medication from one pharmacy and then got the same medication from another pharmacy 4 times, the wanted variable wil have a vaklue of 4. The nuumber I want is 2 (number of unique pharmacies).

Here, I recreate the example with more patients to explain (and the date variable in correct form):

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int patient_id str11 medication_name str3 pharmacy_id float date 123 "brufen" "ABC" 22352 456 "brufen" "XYZ" 22392 789 "paracetamol" "XYZ" 21920 123 "paracetamol" "ABC" 21623 456 "brufen" "ABC" 22656 789 "paracetamol" "ABC" 21951 789 "paracetamol" "ABC" 22011 789 "paracetamol" "ABC" 22015 789 "paracetamol" "ABC" 22025 end format %td date
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30165
#6

02 Jul 2023, 10:30

The only thing is that I noticed it will count non-unique values i.e.:
if a patient got a medication from one pharmacy and then got the same medication from another pharmacy 4 times, the wanted variable wil have a vaklue of 4. The nuumber I want is 2 (number of unique pharmacies).

That doesn't seem at all clear from what you asked for in #1, even after re-reading it. The code I gave in #2 counts the number of refills at other pharmacies within one year.

In any case, I also don't see how you get 2 as the number of different pharmacies. The repeat prescriptions all occur at the same pharmacy, "ABC". So why isn't the answer 1 other pharmacy? If you mean the number of pharmacies including the original one, then the answer should never be 0: at a minimum there is always the 1 original pharmacy.

On the assumption that you do mean 1 other pharmacy for this situation:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int patient_id str11 medication_name str3 pharmacy_id float date 123 "brufen" "ABC" 22352 456 "brufen" "XYZ" 22392 789 "paracetamol" "XYZ" 21920 123 "paracetamol" "ABC" 21623 456 "brufen" "ABC" 22656 789 "paracetamol" "ABC" 21951 789 "paracetamol" "ABC" 22011 789 "paracetamol" "ABC" 22015 789 "paracetamol" "ABC" 22025 end format %td date // VERIFY NO DUPLICATE OBSERVATIONS IN DATA duplicates report assert r(unique_value) == r(N) gen `c(obs_t)' rx_num = _n // LINK EACH PRESCRIPTION TO ALL REPEATS WITHIN 1 YEAR tempfile copy save `copy' rangejoin date 0 365 using `copy', by(patient_id medication_name) // COUNT NUMBER OF DISTINCT PHARMACIES by rx_num (pharmacy_id_U), sort: egen wanted = sum(pharmacy_id_U != pharmacy_id_U[_n-1]) by rx_num (pharmacy_id_U): replace wanted = wanted[_N] - 1 // EXCLUDE SAME PHARMACY REFILLS drop *_U by rx_num: keep if _n == 1

-rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, but I assume you have already done that.

Last edited by Clyde Schechter; 02 Jul 2023, 10:34.
2 likes
Comment
Anna Robert

Join Date: Jun 2023

Posts: 4
#7

02 Jul 2023, 12:30

Thank you very much for the detailed guidance.
It works very well.

Yes, this is what I meant.
Sorry for all the confusions I caused!!

Best regards,
Comment

Announcement