Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with calculating rates with duplicate entries

    Hi everyone,

    I'm a STATA neophyte, but I've tried looking through the manual as well as browsing online forums to no avail so far, so I'm trying my luck here.

    I have a dataset of drug prescriptions by practitioner and by health area that I need to analyse.

    Specifically, I need to calculate the prescription rates (number of prescriptions per 100 people in each practitioner area, and in each health area) for drug A.

    Here are the varlists:
    practiceid (a unique identifier, string, which identifies each practitioner)
    healtharea (a string naming a geographic area, to which multiple practitioners can belong to, but each practitioner can only belong to one area)
    areapopsize (the number of patients in each practitioner's roster)
    rxnumber (the number of prescriptions for a particular drug written by that practitioner)
    drugname (this is pretty self-evident)
    arx (this is essentially a boolean, =1 if the drugname=A, =0 if not)

    So the prescription rates I need to figure out are: rate of prescription for drug A per areapopsize, and for drug A per healtharea

    The nuance is that there could be multiple (non-duplicate) entries recording a practitioner to prescribing drug A (for example entries 1 and 3 below)
    Also, there are multiple practitioners per health area (like 10106 and 10384 for Essex):
    practiceid healtharea areapopsize drugname rxnumber arx
    10106 Essex 6132 A 12 1
    10106 Essex 6132 C 13 0
    10106 Essex 6132 A 9 1
    10384 Essex 3589 A 15 1
    10384 Essex 3589 B 20 0
    10563 Kent 1204 A 15 1
    10909 Lambton 948 C 3 0

    I'm thinking I need to first tally up the rxnumber where arx=1 for each unique practiceid, then divide this by the areapopsize to figure out the rate per areapopsize
    Then I need to combine the rxnumber where arx=1 for each healtharea, and divide this by the total popsize of the healtharea (by tallying up the constituent areapopsizes)

    But I honestly don't know what STATA code to use to do this.

    Any help is appreciated.


    Thanks in advance!

    -Elle

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int practiceid str7 healtharea int areapopsize str1 drugname byte(rxnumber arx)
    10106 "Essex"   6132 "A" 12 1
    10106 "Essex"   6132 "C" 13 0
    10106 "Essex"   6132 "A"  9 1
    10384 "Essex"   3589 "A" 15 1
    10384 "Essex"   3589 "B" 20 0
    10563 "Kent"    1204 "A" 15 1
    10909 "Lambton"  948 "C"  3 0
    end
    
    keep if arx == 1
    collapse (sum) rxnumber (first) areapopsize, by(practiceid healtharea)
    gen rx_rate = rxnumber/areapopsize
    Note: The above takes you literally where you describe wanting to do this just for drug A. If you want a rate for each of the drugs, the code is easily modified to do that by omitting the -keep if arx == 1- command and then adding drugname to the list of variable in the -by()- option of the -collapse- command.

    Read -help collapse-; you will find it a very useful command in many contexts.

    In the future, when showing data examples, please use the -dataex- command, as I have done here. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.
    Last edited by Clyde Schechter; 02 Jul 2018, 20:18.

    Comment


    • #3
      Dear Clyde Schechter,

      Is there a way to keep other variables in dataset after using the collapse command? For example, how to keep drugname in the example above? In fact, I did not find answers to my question in -help collapse-. I also tried -help preverve- but it seems not working as well.

      Thanks

      DL

      Comment


      • #4
        To keep drugname in the collapsed data set, add it to the list of variables in the -by()- option, as mentioned in #2.

        Comment


        • #5
          Thanks Dr. Schecter. The collapse function is indeed quite useful. And thank you for letting me know about the -dataex- etiquette for posting questions!

          Comment

          Working...
          X