Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum by group per id

    Hi Stata Users,

    I am using Stata 17 on Windows and have some data that I would want to find a sum by group
    Example data (with the desired variables) is below
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte id int year str1 type int(amt suma sumb sumc)
    1 2011 "A" 1200 2100  700  650
    1 2013 "A"  900 2100  700  650
    1 2014 "B"  700 2100  700  650
    1 2015 "C"  650 2100  700  650
    2 2012 "A"  850  850 1100  900
    2 2014 "B" 1100  850 1100  900
    2 2015 "B"  900  850 1100  900
    3 2012 "C"  800  700  800 1400
    3 2013 "C"  600  700  800 1400
    3 2014 "A"  700  700  800 1400
    3 2015 "B"  800  700  800 1400
    4 2011 "B" 1300  700 1900 4800
    4 2012 "C" 1450  700 1900 4800
    4 2013 "C" 1300  700 1900 4800
    4 2014 "C" 1250  700 1900 4800
    4 2014 "B"  600  700 1900 4800
    4 2015 "A"  700  700 1900 4800
    4 2015 "C"  800  700 1900 4800
    5 2012 "C"  900 2550  650 1700
    5 2012 "A"  850 2550  650 1700
    5 2013 "B"  650 2550  650 1700
    5 2014 "A"  750 2550  650 1700
    5 2015 "C"  800 2550  650 1700
    5 2015 "A"  950 2550  650 1700
    end
    The original variables are id, year and type and would like to generate a sum by type for each unique ID.
    Thanks in advance!

  • #2
    Originally posted by Stephen Okiya View Post
    The original variables are id, year and type and would like to generate a sum by type for each unique ID.
    Thanks in advance!
    this would be a rather inefficient data format because of the obvious redundancy.

    Better do something like:
    Code:
    drop suma sumb sumc
    
    collapse (sum) sum=amt, by(id type)
    reshape wide sum, i(id) j(type) string
    rename sum*, lower
    which results in
    Code:
         +-------------------------+
         | id   suma   sumb   sumc |
         |-------------------------|
      1. |  1   2100    700    650 |
      2. |  2    850   2000      . |
      3. |  3    700    800   1400 |
      4. |  4    700   1900   4800 |
      5. |  5   2550    650   1700 |
         +-------------------------+
    I am surprised nobody from the DIME colleagues could help you with this task.

    Best, Sergiy

    Comment


    • #3
      Thanks so much Sergiy Radyakin for the proposed solution. There are other variables of interest that I have not included in the example dataset and my desired goal will be efficiently achieved by the structure I am thinking of.

      Comment


      • #4
        in that case you can merge the resulting dataset (by id) back to your original dataset to get the desired.

        Best, Sergiy

        Comment


        • #5
          I found out a way to achieve my desired goal through the code below
          Code:
              levelsof type, local(levels)
              foreach l of local levels {
                  bys id type : egen sum_`l' = total(amt) if type == "`l'"
                  bys id  (sum_`l') : replace sum_`l' = sum_`l'[_n-1] if missing(sum_`l')
              }

          Comment

          Working...
          X