Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determining Averages with Conditions

    I am trying to use box and whisker plots to determine averages of "payment" for variables that I defined, here is one example:
    *define my new variable
    generate pyp=(cpt=="78800"|cpt=="78803"|cpt=="78830")
    *I have a variable "payment" which is the total sum of payments for a given number of variable "patients". I tried creating a new variable "payment_per_patient" defined as payments/patients then I used
    graph box payment_per_patient if pyp_planar==1, over (year)
    *but the numbers calculated do not make any sense. I suspect that I am not calculating the number correctly because I need to define my numerator as "payment" for each cpt code ((cpt=="78800"|cpt=="78803"|cpt=="78830")) and then define my denominator as "patients" for each of those codes.
    Is this correct or is there a faster way to do this? This method would be very time consuming.
    Thanks for the help

  • #2
    I suspect that I am not calculating the number correctly because I need to define my numerator as "payment" for each cpt code ((cpt=="78800"|cpt=="78803"|cpt=="78830")) and then define my denominator as "patients" for each of those codes.
    Is this correct or is there a faster way to do this? This method would be very time consuming.
    It is correct that to calculate average payment per patient you would need the numerator to be total payments and denominator to be total number of patients. Your description of your situation, however, makes it unclear what you actually have and how the data are organized and aggregated.

    It is hard to understand why this calculation "would be very time consuming" unless you have some utterly bizarre and dysfunctional data set. If you post example data (use -dataex-, of course) and provide an explanation of the names of all variables involved, probably someone can show you a quick and easy way to do this.

    Comment


    • #3
      cpt: code for each line of data
      patients: number of people in each row
      allowed_charge_amt: total paid for all patients combined
      year: year it occured
      pyp: defined as (cpt=="78800"|cpt=="78803"|cpt=="78830")
      . dataex cpt provider patients allowed_charge_amt year pyp

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str5 cpt str2 provider double(patients allowed_charge_amt) float(year pyp)
      "93312" "06"    .                  . 2010 0
      "75574" "06"  221    22747.880859375 2010 0
      "93460" "06"    .                  . 2010 0
      "78452" "36"   24 1977.0799560546875 2010 0
      "93458" "06"    .                  . 2010 0
      "93312" "06"   23     2476.580078125 2010 0
      "93458" "06"   13                  0 2010 0
      "78472" "30"  164   6160.14013671875 2010 0
      "93306" "08"    .                  . 2010 0
      "93314" "33"   70     4311.580078125 2010 0
      "93306" "38"   49      9185.23046875 2010 0
      "93459" "06"    .                  . 2010 0
      "78800" "30"    .                  . 2010 1
      "78452" "06"   13  917.1599731445313 2010 0
      "78459" "47"    .                  . 2010 0
      "93458" "06"    .                  . 2010 0
      "93460" "06"    .                  . 2010 0
      "93458" "11"    .                  . 2010 0
      "93458" "11"    .                  . 2010 0
      "93306" "06"   94   5737.06005859375 2010 0
      "93308" "06"    .                  . 2010 0
      "93458" "06"  185      29784.4609375 2010 0
      "75572" "36"    .                  . 2010 0
      "92978" "06"  225       19951.359375 2010 0
      "93306" "11"    .                  . 2010 0
      "93454" "06"    .                  . 2010 0
      "93304" "11"    .                  . 2010 0
      "93015" "06"    .                  . 2010 0
      "93314" "33"   25  1416.800048828125 2010 0
      "75574" "06"   39    17204.880859375 2010 0
      "93306" "78"    .                  . 2010 0
      "75571" "30"   66                  0 2010 0
      "78452" "06"    .                  . 2010 0
      "93458" "06"   16     4523.830078125 2010 0
      "93312" "06"   98      9825.26953125 2010 0
      "93018" "06"   20   299.739990234375 2010 0
      "93458" "06"    .                  . 2010 0
      "93458" "06"    .                  . 2010 0
      "93452" "06"    .                  . 2010 0
      "92978" "06"    .                  . 2010 0
      "93307" "47"   14    665.72998046875 2010 0
      "75571" "30"    .                  . 2010 0
      "78473" "06"    .                  . 2010 0
      "78803" "30"    .                  . 2010 1
      "93571" "11"    .                  . 2010 0
      "93018" "06"    .                  . 2010 0
      "75572" "30"    .                  . 2010 0
      "93306" "06"   11    733.47998046875 2010 0
      "93312" "06"    .                  . 2010 0
      "93306" "21"    .                  . 2010 0
      "93018" "78"   23  353.9700012207031 2010 0
      "93018" "36"  185  2889.570068359375 2010 0
      "78800" "30"    .                  . 2010 1
      "75561" "30"   52    6706.9599609375 2010 0
      "78472" "06"    .                  . 2010 0
      "78452" "08"    .                  . 2010 0
      "93458" "11"   13  1962.719970703125 2010 0
      "93307" "11"    .                  . 2010 0
      "78803" "36"    .                  . 2010 1
      "93351" "06"    .                  . 2010 0
      "93306" "11"   17                  0 2010 0
      "93571" "11"    .                  . 2010 0
      "93350" "06"    .                  . 2010 0
      "93571" "06"    .                  . 2010 0
      "78452" "30"   11  945.5599975585938 2010 0
      "93308" "06"  306     37019.94140625 2010 0
      "93306" "06"   28 1827.3599853515625 2010 0
      "78454" "30"   19   1209.72998046875 2010 0
      "93306" "08"   14  2230.760009765625 2010 0
      "93571" "11"    .                  . 2010 0
      "93303" "33"   11    2453.8798828125 2010 0
      "93312" "05"    .                  . 2010 0
      "93460" "06"    .                  . 2010 0
      "93308" "93"    .                  . 2010 0
      "78452" "30"   66               4743 2010 0
      "78800" "02"   12  344.0799865722656 2010 1
      "78452" "06" 2253        163848.5625 2010 0
      "75574" "30"    .                  . 2010 0
      "75571" "11"    .                  . 2010 0
      "93459" "06"    .                  . 2010 0
      "93306" "46"    .                  . 2010 0
      "78452" "06"    .                  . 2010 0
      "93461" "11"    .                  . 2010 0
      "93453" "06"    .                  . 2010 0
      "78472" "06"   11  436.7699890136719 2010 0
      "78452" "30"  306    21450.529296875 2010 0
      "93453" "06"    .                  . 2010 0
      "93306" "21"   18    1144.9599609375 2010 0
      "93308" "06"    .                  . 2010 0
      "93306" "30"    .                  . 2010 0
      "93306" "21"  303    19737.279296875 2010 0
      "93018" "93"   41  591.5999755859375 2010 0
      "93312" "11"    .                  . 2010 0
      "75571" "30"    .                  . 2010 0
      "93457" "06"    .                  . 2010 0
      "93454" "06"    .                  . 2010 0
      "93306" "06"    .                  . 2010 0
      "93458" "06"    .                  . 2010 0
      "93571" "06"    .                  . 2010 0
      "93306" "30"    .                  . 2010 0
      end
      Thanks

      Comment


      • #4
        Code:
        assert missing(patients) == missing(allowed_charge_amt)
        
        
        //  AVERAGE CHARGES PER PATIENT: ALL CPT CODES COMBINED
        summ allowed_charge_amt, meanonly
        local numerator = r(sum)
        summ patients, meanonly
        local denominator = r(sum)
        display as text "Average Per-Patient Charges = " %2.1f =`numerator'/`denominator'
        
        //  AVERAGE PAYMENTS PER PATIENT: BY PYP-DEFINED CPT GROUPS
        by pyp, sort: egen numerator = total(allowed_charge_amt)
        by pyp: egen denominator = total(patients)
        gen avg_charges_per_pt_pyp_group = numerator/denominator
        
        //  AVERAGE PAYMENTS PER PATIENT: FOR EACH CPT CODE
        by cpt, sort: egen numerator2 = total(allowed_charge_amt)
        by cpt: egen denominator2 = total(patients)
        gen avg_charges_per_pt_this_cpt = numerator2/denominator2
        format avg_charges_per_pt* %2.1f
        Last edited by Clyde Schechter; 21 Sep 2022, 12:10.

        Comment


        • #5
          Thank you!

          I'm not sure I completely understand this code. How would I utilize this to make a box-whisker plot of the data? As this already computes averages it would only have a single value for each group and "graph box avg_charges_per_pt_pyp_group, over(year)" doesn't make sense. Correct?

          Thanks
          Last edited by Yosef Cohen; 22 Sep 2022, 10:29.

          Comment


          • #6
            Oh, I didn't realize you wanted it by year. So it just changes to:
            Code:
            assert missing(patients) == missing(allowed_charge_amt)
            
            //  AVERAGE CHARGES PER PATIENT: ALL CPT CODES COMBINED
            by year, sort: egen numerator0 = total(allowed_charge_amt)
            by year: egen denominator0 = total(patients)
            gen avg_charges_per_pt_all_cpt = numerator0/denominator0
            
            //  AVERAGE PAYMENTS PER PATIENT: BY PYP-DEFINED CPT GROUPS
            by pyp year, sort: egen numerator1 = total(allowed_charge_amt)
            by pyp year: egen denominator1 = total(patients)
            gen avg_charges_per_pt_pyp_group = numerator1/denominator1
            
            //  AVERAGE PAYMENTS PER PATIENT: FOR EACH CPT CODE
            by cpt year, sort: egen numerator2 = total(allowed_charge_amt)
            by cpt year: egen denominator2 = total(patients)
            gen avg_charges_per_pt_this_cpt = numerator2/denominator2
            format avg_charges_per_pt* %2.1f
            This will enable you to do plots over year. (In your real data, that is. The example data is all from a single year, so a box plot over years is just a single box.) To do this for the pyp groups, you will need to restrict the graph to one observation per pyp group:
            Code:
            egen tag = tag(pyp year)
            graph box avg_charges_per_pt_pyp_group if tag, over(year)
            If you choose to do graphs at other levels of aggregation, you will similarly need to tag a single observation per aggregate per year and restrict the plot to the tagged observations.

            But there is a problem for your box plots at the pyp level. There are only two pyp groups. So a box plot of average charges per patient per pyp group is not going to look much like a box plot. It will be based on only two data points in each year, so most of the components of the box plots will not be really meaningful. It might make more sense to do something like line graphs for both pyp groups plotting average per patient over time. The plots per cpt code would be appropriate for box-plotting because there are many of them, though perhaps you are not interested in those distributions for your research purposes.

            I strongly recommend you read -help by- and the section of the PDF manuals that is linked (in blue) near the top of that screen. That will enable you to better understand how this code works. The -by:- prefix is one of Stata's most useful features and is something that everyone using Stata needs to be familiar with. There is hardly anything beyond toy problems that can be sensibly worked out in Stata without it. I suspect that an insufficient knowledge of -by- is why you thought that coding this problem would be difficult and time-consuming.

            Comment


            • #7
              Thank you very much. This was very helpful. I think a line graph would make more sense to display this data by pyp defined group.

              Although I am not sure why when I tabulate this data I get this column for avg_charges_per_pt_pyp_group1
              tab avg_charges_per_pt_pyp_group1 year

              avg_charge |
              s_per_pt_p | year
              yp_group1 | 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 | Total
              -----------+-------------------------------------------------------------------------------------------------------------------------+----------
              61.25278 | 0 0 0 0 0 1,282 0 0 0 0 0 | 1,282
              64.64481 | 0 0 0 0 0 0 1,339 1,339 0 0 0 | 2,678
              65.52666 | 0 0 0 1,442 0 0 0 0 0 0 0 | 1,442
              65.70789 | 0 0 0 0 1,240 0 0 0 0 0 0 | 1,240
              67.67278 | 0 0 0 0 0 0 0 0 1,529 0 0 | 1,529
              75.5192 | 0 0 0 0 0 0 0 0 0 1,683 0 | 1,683
              86.12766 | 0 0 1,612 0 0 0 0 0 0 0 0 | 1,612
              93.90189 | 1,597 1,597 0 0 0 0 0 0 0 0 0 | 3,194
              131.6356 | 0 0 0 0 0 0 0 0 0 0 0 | 5,224
              134.6948 | 0 0 0 0 0 0 0 0 0 0 4,910 | 4,910
              -----------+-------------------------------------------------------------------------------------------------------------------------+----------
              Total | 1,597 1,597 1,612 1,442 1,240 1,282 1,339 1,339 1,529 1,683 4,910 | 24,794


              avg_charge |
              s_per_pt_p | year
              yp_group1 | 2021 | Total
              -----------+-----------+----------
              61.25278 | 0 | 1,282
              64.64481 | 0 | 2,678
              65.52666 | 0 | 1,442
              65.70789 | 0 | 1,240
              67.67278 | 0 | 1,529
              75.5192 | 0 | 1,683
              86.12766 | 0 | 1,612
              93.90189 | 0 | 3,194
              131.6356 | 5,224 | 5,224
              134.6948 | 0 | 4,910
              -----------+-----------+----------
              Total | 5,224 | 24,794


              .
              Last edited by Yosef Cohen; 22 Sep 2022, 11:43.

              Comment


              • #8
                The output you are showing is basically unreadable because you failed to put it between code delimiters.

                Suffice it to say, I'm not sure what your question is. You are counting the number of observations that contributed to each year's results. Is that what you intended? If you just want to list out the average charge in each pyp group for each year, you would do that as follows:

                Code:
                egen tag = tag(pyp year)
                sort year pyp
                list year pyp if tag, noobs clean

                Comment


                • #9
                  so when I use this code I end up with two results for each year.

                  . egen tag = tag(payment_pyp_planar year)

                  .
                  . sort year payment_pyp_planar

                  .
                  . list year payment_pyp_planar if tag, noobs clean

                  year paymen~r
                  2010 76.61958
                  2010 154.9243
                  2011 76.61958
                  2011 154.9243
                  2012 61.94535
                  2012 141.273
                  2013 57.03046
                  2013 132.2503
                  2014 51.14558
                  2014 138.2199
                  2015 50.53815
                  2015 136.0601
                  2016 50.23735
                  2016 136.1592
                  2017 50.23735
                  2017 136.1592
                  2018 78.32764
                  2018 136.4285
                  2019 83.75484
                  2019 136.8585
                  2020 38.76878
                  2020 76.43407
                  2021 39.60476
                  2021 74.20837

                  .

                  Comment


                  • #10
                    There are two different values of pyp, and these occur in each year. So, yes, you will have two values for each year, one for each level of the pyp variable. Also, don't do the tag on the payment--on the off chance that two years turn out to have the same average payment, that will mess things up.

                    Use the code exactly as shown in #8, except for possibly changing variable names if the names used in your example are not the same as the ones in your real data set.

                    Comment

                    Working...
                    X