Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create new variable = sum by group across multiple rows (peculiar data set-up)

    The variable "denominator" was computed from other variables. "Monthly_notification_date" is essentially months (731 being Dec 2020 or so....below those many rows of 731 are 732, 733 etc for subsequent months in the data). I am trying to find the sum of "denominator" in each month and I want to call it facility_total. The first column is "facility." The problem is that every facility will have the same value for "denominator" and there are multiple rows with the same ID for "facility". I only need the "denominator" to be added once for every "facility" during each month.


    I have tried both bysort monthly_notification_date : egen facility_total = total(denominator) AND bysort monthly_notification_date : egen facility_total = sum(denominator), none of which worked. They both gave impossibly high values because the repeated rows of "denominator" is being added multiple times for each facility, instead of once. Please kindly help with a code that resolves this.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(facility monthly_notification_date denominator)
    10 731  9
    37 731 10
    32 731  7
    34 731 10
    34 731 10
     5 731 11
    35 731  9
    30 731  8
    31 731 11
    36 731  9
    17 731 11
     5 731 11
    22 731  8
     5 731 11
    29 731  9
    22 731  8
    22 731  8
     5 731 11
    11 731  8
    73 731 11
    17 731 11
    19 731 10
    35 731  9
    35 731  9
    73 731 11
    12 731  9
    22 731  8
    21 731 10
    33 731  9
    14 731  8
    17 731 11
    15 731 10
     6 731  8
    20 731 10
    21 731 10
    15 731 10
    12 731  9
     4 731  8
    33 731  9
    34 731 10
    19 731 10
     5 731 11
    19 731 10
    35 731  9
    12 731  9
    10 731  9
    11 731  8
    27 731 10
    11 731  8
     2 731 10
    27 731 10
    28 731 13
    24 731  5
    20 731 10
    22 731  8
     8 731 11
    37 731 10
    35 731  9
    17 731 11
     7 731  8
    12 731  9
     5 731 11
    22 731  8
    34 731 10
     4 731  8
    22 731  8
     8 731 11
    19 731 10
    25 731 10
    32 731  7
    21 731 10
    25 731 10
    20 731 10
    37 731 10
    15 731 10
    35 731  9
    14 731  8
     4 731  8
    74 731  6
    11 731  8
    27 731 10
    36 731  9
     8 731 11
    27 731 10
    27 731 10
     4 731  8
     5 731 11
    21 731 10
    28 731 13
    19 731 10
    12 731  9
    25 731 10
    15 731 10
    25 731 10
    37 731 10
     8 731 11
    30 731  8
    10 731  9
    30 731  8
    24 731  5
    end
    format %tm monthly_notification_date

  • #2
    If I understand correctly, you want to total up the denominators found in each month, but you only want to count each facility once, and you are certain that each facility always has the same value of denominator (or at least that is true within any given month). If that is right, then you should get what you want with:
    Code:
    //  VERIFY CRITICAL ASSUMPTION OF CONSTANCY OF DENOMINATOR WITHIN FACILITY-MONTH
    by monthly_notification_date facility (denominator), sort: assert denominator[1] == denominator[_N]
    
    //  TAG A SINGLE OBSERVATION FOR EACH FACILITY IN A MONTH
    egen tag = tag(facility monthly_notification_date)
    
    //  TOTAL UP DENOMINATORS FROM TAGGED OBSERVATIONS
    by monthly_notification_date, sort: egen wanted = total(cond(tag, denominator, .))

    Comment


    • #3
      Thanks so much for your response . I have tried your code. The numbers are a little off, but only a little. What if you knew for sure that each "facility" appear at least once or several times throughout the data (same for each month - once or several times), how would that affect your code? I ask this because each row is actually a case with multiple cases belonging to the same facility with a unique outbreak_id.

      And I have a second question: how do I create a new variable that shows number of outbreaks in each month (i.e a variable that groups "facility" by monthly notification date). It is fine if that number for each month repeats in every row. (Further note: every row is a case).

      The "denominator" is constant for each "facility" but there are multiple "facility" in the data and in each month, so it is not "constant" throughout the data or month. It is only constant if it is the same "facility"
      Last edited by Michael Olu; 13 Sep 2022, 19:19.

      Comment


      • #4
        Thanks so much for your help Clyde Schechter. I just re-checked the numbers and I saw that the code works perfectly.

        I do have a second question: how can I create a new variable that shows number of outbreaks in each month (i.e a new variable that only counts one instance of "facility" in a month (monthly_notification_date), no matter how many times that "facility" appears in the rows. It is fine if that number for each month repeats in every row for that month.
        Last edited by Michael Olu; 13 Sep 2022, 22:08.

        Comment


        • #5
          I don't see any variable in your example data whose name suggests it is related to "outbreaks." So I'm not clear what you are asking for. Counting only one instance of each facility when doing what?

          Comment


          • #6
            Please see data below. I want to know how many unique outbreak_id are in each monthly_notification_date and I want that to be a new variable.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long(outbreak_id case_id) int outbreak_notification_dt float(monthly_notification_date denominator far)
            101808237 101571815 22252 731  8      12.5
            101808238 101537036 22252 731 10        30
            101808238 101542546 22252 731 10        30
            101808238 100870665 22252 731 10        30
            101808251 101561789 22253 731  9  11.11111
            101880030 100908957 22253 731  8      87.5
            101880030 101596364 22253 731  8      87.5
            101880030 101590652 22253 731  8      87.5
            101880030 101561491 22253 731  8      87.5
            101880030 101583062 22253 731  8      87.5
            101880030 101565260 22253 731  8      87.5
            101880030 101579678 22253 731  8      87.5
            101880032 101594306 22254 731 11       100
            101880032 101587417 22254 731 11       100
            101880032 101583385 22254 731 11       100
            101880032 101587416 22254 731 11       100
            101880032 101583387 22254 731 11       100
            101880032 101583382 22254 731 11       100
            101880032 100929074 22254 731 11       100
            101880032 101574229 22254 731 11       100
            101880032 101583388 22254 731 11       100
            101880032 101583384 22254 731 11       100
            101880032 101604831 22254 731 11       100
            101880033 101569607 22254 731  8        25
            101880033 101579040 22254 731  8        25
            101880035 101571986 22255 731  8        25
            101880035 101390485 22255 731  8        25
            101880036 101714315 22255 731 11  72.72727
            101880036 101600792 22255 731 11  72.72727
            101880036 101692057 22255 731 11  72.72727
            101880036 101622754 22255 731 11  72.72727
            101880036 101600785 22255 731 11  72.72727
            101880036 101622387 22255 731 11  72.72727
            101880036 101602573 22255 731 11  72.72727
            101880036 101425878 22255 731 11  72.72727
            101880038 101570327 22256 731  7 14.285714
            101880041 101571679 22256 731  9       100
            101880041 101569784 22256 731  9       100
            101880041 101587745 22256 731  9       100
            101880041 101587756 22256 731  9       100
            101880041 101587744 22256 731  9       100
            101880041 101580338 22256 731  9       100
            101880041 101578880 22256 731  9       100
            101880041 101713740 22256 731  9       100
            101880041 101713261 22256 731  9       100
            101880044 101624505 22257 731  8      62.5
            101880044 101641393 22257 731  8      62.5
            101880044 101635557 22257 731  8      62.5
            101880044 101581798 22257 731  8      62.5
            101880044 101635556 22257 731  8      62.5
            101880045 101574681 22257 731  9  77.77778
            101880045 101574680 22257 731  9  77.77778
            101880045 101574679 22257 731  9  77.77778
            101880045 101564353 22257 731  9  77.77778
            101880045 101588183 22257 731  9  77.77778
            101880045 101542555 22257 731  9  77.77778
            101880045 101575916 22257 731  9  77.77778
            101880047 101617467 22258 731 12  8.333333
            101880048 101604514 22258 731  8        25
            101880048 101668896 22258 731  8        25
            101880049 101610435 22258 731 10       100
            101880049 101615852 22258 731 10       100
            101880049 101604762 22258 731 10       100
            101880049 101604764 22258 731 10       100
            101880049 101608757 22258 731 10       100
            101880049 101604819 22258 731 10       100
            101880049 101605753 22258 731 10       100
            101880049 101605516 22258 731 10       100
            101880049 101604765 22258 731 10       100
            101880049 101587988 22258 731 10       100
            101880049 101589393 22258 731 10       100
            101880052 101630731 22260 731  7  57.14286
            101880052 101603429 22260 731  7  57.14286
            101880052 101574316 22260 731  7  57.14286
            101880052 101631028 22260 731  7  57.14286
            101880053 101582686 22262 731 11  90.90909
            101880053 101610740 22262 731 11  90.90909
            101880053 101610743 22262 731 11  90.90909
            101880053 101582240 22262 731 11  90.90909
            101880053 101628024 22262 731 11  90.90909
            101880053 101607981 22262 731 11  90.90909
            101880053 101604126 22262 731 11  90.90909
            101880053 101607983 22262 731 11  90.90909
            101880053 101621634 22262 731 11  90.90909
            101880053 101607976 22262 731 11  90.90909
            101880054 101648161 22263 731  3 33.333332
            101880056 101624344 22265 731 10       100
            101880056 101668011 22265 731 10       100
            101880056 101636645 22265 731 10       100
            101880056 101636660 22265 731 10       100
            101880056 101636648 22265 731 10       100
            101880056 101640167 22265 731 10       100
            101880056 101274378 22265 731 10       100
            101880056 101621593 22265 731 10       100
            101880056 101633914 22265 731 10       100
            101880056 101636649 22265 731 10       100
            101880058 101582888 22265 731 10        60
            101880058 101790846 22265 731 10        60
            101880058 101607873 22265 731 10        60
            101880058 101748416 22265 731 10        60
            end
            format %tddd-Mon-YY outbreak_notification_dt
            format %tm monthly_notification_date

            Comment


            • #7
              Code:
              by monthly_notification_date (outbreak_id), sort: gen wanted = ///
                  sum(outbreak_id != outbreak_id[_n-1])
              by monthly_notification_date (outbreak_id): replace wanted = wanted[_N]
              
              egen tag = tag(monthly_notification_date)
              
              list monthly_notification_date wanted if tag, noobs clean

              Comment


              • #8
                Perfect. Thanks!

                Comment

                Working...
                X