Sum by group per id

Stephen Okiya

Join Date: Jul 2025
Posts: 280

Sum by group per id

11 Sep 2023, 11:22

Hi Stata Users,

I am using Stata 17 on Windows and have some data that I would want to find a sum by group
Example data (with the desired variables) is below

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte id int year str1 type int(amt suma sumb sumc)
1 2011 "A" 1200 2100  700  650
1 2013 "A"  900 2100  700  650
1 2014 "B"  700 2100  700  650
1 2015 "C"  650 2100  700  650
2 2012 "A"  850  850 1100  900
2 2014 "B" 1100  850 1100  900
2 2015 "B"  900  850 1100  900
3 2012 "C"  800  700  800 1400
3 2013 "C"  600  700  800 1400
3 2014 "A"  700  700  800 1400
3 2015 "B"  800  700  800 1400
4 2011 "B" 1300  700 1900 4800
4 2012 "C" 1450  700 1900 4800
4 2013 "C" 1300  700 1900 4800
4 2014 "C" 1250  700 1900 4800
4 2014 "B"  600  700 1900 4800
4 2015 "A"  700  700 1900 4800
4 2015 "C"  800  700 1900 4800
5 2012 "C"  900 2550  650 1700
5 2012 "A"  850 2550  650 1700
5 2013 "B"  650 2550  650 1700
5 2014 "A"  750 2550  650 1700
5 2015 "C"  800 2550  650 1700
5 2015 "A"  950 2550  650 1700
end

The original variables are id, year and type and would like to generate a sum by type for each unique ID.
Thanks in advance!

Tags: None

Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#2

11 Sep 2023, 11:37

Originally posted by Stephen Okiya View Post

The original variables are id, year and type and would like to generate a sum by type for each unique ID.
Thanks in advance!

this would be a rather inefficient data format because of the obvious redundancy.

Better do something like:

Code:

drop suma sumb sumc collapse (sum) sum=amt, by(id type) reshape wide sum, i(id) j(type) string rename sum*, lower

which results in

Code:

+-------------------------+ | id suma sumb sumc | |-------------------------| 1. | 1 2100 700 650 | 2. | 2 850 2000 . | 3. | 3 700 800 1400 | 4. | 4 700 1900 4800 | 5. | 5 2550 650 1700 | +-------------------------+

I am surprised nobody from the DIME colleagues could help you with this task.

Best, Sergiy
Comment
Stephen Okiya

Join Date: Jul 2025

Posts: 280
#3

11 Sep 2023, 11:48

Thanks so much Sergiy Radyakin for the proposed solution. There are other variables of interest that I have not included in the example dataset and my desired goal will be efficiently achieved by the structure I am thinking of.
Comment
Sergiy Radyakin

Join Date: Apr 2014

Posts: 1867
#4

11 Sep 2023, 12:17

in that case you can merge the resulting dataset (by id) back to your original dataset to get the desired.

Best, Sergiy
Comment

Stephen Okiya

Join Date: Jul 2025
Posts: 280

12 Sep 2023, 06:47

I found out a way to achieve my desired goal through the code below

Code:

    levelsof type, local(levels)
    foreach l of local levels {
        bys id type : egen sum_`l' = total(amt) if type == "`l'"
        bys id  (sum_`l') : replace sum_`l' = sum_`l'[_n-1] if missing(sum_`l')
    }

Announcement

Sum by group per id

Comment

Comment

Comment

Comment