Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • counting number of observations with conditions

    Hi,

    I have data that list presecriptions that have been filled from variouus pharmacies. It has the following variables:
    - Patient id
    - Medication name
    - Pharmacy id
    - Presecription Date

    I want to:
    1. counting number of patients who got the same "medication name" from more than one pharmacy.
    2. if possible, how to limit number 1 to those who got the same medication (from more than one pharmacy) within one year i.e. the time difference between the 2 prescriptions is 1 year or less.

    Is this possible?

    Thanks!!

  • #2
    Anna:
    welcome to this forum.
    1) see, as a first step to take, the -group- function available from -egen-;
    2) please share an example/excerpt of your dataset via -dataex- (as per FAQ). Thanks.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thanks Carlo.

      I will read about point 1 (the -group- function).

      Here is an example of the dataset:


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int patient_id str11 medication_name str3 pharmacy_id str10 date
      123 "brufen"      "ABC" "2021-03-13"
      456 "brufen"      "XYZ" "2021-04-22"
      789 "paracetamol" "XYZ" "2020-01-06"
      123 "paracetamol" "ABC" "2019-03-15"
      456 "brufen"      "ABC" "2022-01-11"
      end
      For example, Patient 456 get the same medication from Pharmacy XYZ and Pharmacy ABC.

      Thanks.

      Comment


      • #4
        Most of the work here is preparing the data. First, it is important to verify that you do not have any completely duplicate observations in the data set: the code below checks this first. Next, you need a real Stata date variable--you can't do calculations like within 1 year using a string variable. (Well, you can, but you wouldn't want to put yourself through the pain that involves.) Finally, you need a variable that uniquely identifies prescriptions. After that, -rangestat-, written by Robert Picard, Nick Cox, and Roberto Ferrer, available from SSC, does the work.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input int patient_id str11 medication_name str3 pharmacy_id str10 date
        123 "brufen"      "ABC" "2021-03-13"
        456 "brufen"      "XYZ" "2021-04-22"
        789 "paracetamol" "XYZ" "2020-01-06"
        123 "paracetamol" "ABC" "2019-03-15"
        456 "brufen"      "ABC" "2022-01-11"
        end
        
        //    VERIFY NO DUPLICATE OBSERVATIONS IN DATA
        duplicates report
        assert r(unique_value) == r(N)
        
        //    CREATE A REAL STATA DATE VARIABLE
        gen _date = daily(date, "YMD"), after(date)
        assert missing(date) == missing(_date)
        format _date %td
        drop date
        rename _date date
        
        //    CREATE A VARIABLE THAT IDENTIFIES OBSERVATIONS
        gen `c(obs_t)' rx_num = _n
        
        //    COUNT UP REPEAT PRESCRIPTION OF SAME DRUG BY SAME PATIENT WITHIN 1 YEAR
        rangestat (count) repeats = rx_num, by(patient_id medication_name) ///
            interval(date 0 365)
        //    COUNT UP REPEATE PRESCRIPTIONS OF SAME DRUG BY SAME PATIENT AT SAME PHARMACY
        rangestat (count) same_pharmacy_repeats = rx_num, ///
            by(patient_id medication_name pharmacy_id) interval(date 0 365)
            
        gen int wanted = repeats - same_pharmacy_repeats

        Comment


        • #5
          Thanks Clyde.

          This code was helpful. The only thing is that I noticed it will count non-unique values i.e.:
          if a patient got a medication from one pharmacy and then got the same medication from another pharmacy 4 times, the wanted variable wil have a vaklue of 4. The nuumber I want is 2 (number of unique pharmacies).

          Click image for larger version

Name:	Screenshot.jpg
Views:	1
Size:	40.2 KB
ID:	1719118




          Here, I recreate the example with more patients to explain (and the date variable in correct form):

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input int patient_id str11 medication_name str3 pharmacy_id float date
          123 "brufen"      "ABC" 22352
          456 "brufen"      "XYZ" 22392
          789 "paracetamol" "XYZ" 21920
          123 "paracetamol" "ABC" 21623
          456 "brufen"      "ABC" 22656
          789 "paracetamol" "ABC" 21951
          789 "paracetamol" "ABC" 22011
          789 "paracetamol" "ABC" 22015
          789 "paracetamol" "ABC" 22025
          end
          format %td date

          Comment


          • #6
            The only thing is that I noticed it will count non-unique values i.e.:
            if a patient got a medication from one pharmacy and then got the same medication from another pharmacy 4 times, the wanted variable wil have a vaklue of 4. The nuumber I want is 2 (number of unique pharmacies).
            That doesn't seem at all clear from what you asked for in #1, even after re-reading it. The code I gave in #2 counts the number of refills at other pharmacies within one year.

            In any case, I also don't see how you get 2 as the number of different pharmacies. The repeat prescriptions all occur at the same pharmacy, "ABC". So why isn't the answer 1 other pharmacy? If you mean the number of pharmacies including the original one, then the answer should never be 0: at a minimum there is always the 1 original pharmacy.

            On the assumption that you do mean 1 other pharmacy for this situation:
            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input int patient_id str11 medication_name str3 pharmacy_id float date
            123 "brufen"      "ABC" 22352
            456 "brufen"      "XYZ" 22392
            789 "paracetamol" "XYZ" 21920
            123 "paracetamol" "ABC" 21623
            456 "brufen"      "ABC" 22656
            789 "paracetamol" "ABC" 21951
            789 "paracetamol" "ABC" 22011
            789 "paracetamol" "ABC" 22015
            789 "paracetamol" "ABC" 22025
            end
            format %td date
            //    VERIFY NO DUPLICATE OBSERVATIONS IN DATA
            duplicates report
            assert r(unique_value) == r(N)
            
            gen `c(obs_t)' rx_num = _n
            
            //    LINK EACH PRESCRIPTION TO ALL REPEATS WITHIN 1 YEAR
            tempfile copy
            save `copy'
            rangejoin date 0 365 using `copy', by(patient_id medication_name)
            
            //    COUNT NUMBER OF DISTINCT PHARMACIES
            by rx_num (pharmacy_id_U), sort: egen wanted = sum(pharmacy_id_U != pharmacy_id_U[_n-1])
            by rx_num (pharmacy_id_U): replace wanted = wanted[_N] - 1 // EXCLUDE SAME PHARMACY REFILLS
            drop *_U
            by rx_num: keep if _n == 1
            -rangejoin- is written by Robert Picard and is available from SSC. To use it, you must also install -rangestat-, but I assume you have already done that.
            Last edited by Clyde Schechter; 02 Jul 2023, 10:34.

            Comment


            • #7
              Thank you very much for the detailed guidance.
              It works very well.

              Yes, this is what I meant.
              Sorry for all the confusions I caused!!

              Best regards,

              Comment

              Working...
              X