How to ensure each group has at least one row?

Clara Daru

Join Date: Nov 2020

Posts: 6
#1

How to ensure each group has at least one row?

16 Feb 2023, 18:27

Hi everyone,

I have a dataset that looks like this:

ID group amount

A 1 2

A 2 3

A 1 4

A 2 1

B 1 2

B 1 1

I'm getting the total amount for each ID and group:

Code:

egen group_amount = total(amount), by(ID group)

And then keeping only one row:

Code:

bysort ID group: keep if _n == 1

(I know I can do the same with collapse, but there are other variables I'd like to keep)

As you can see, B does not have any amount in group 2, so the result will be

ID group group_amount

A 1 6

A 2 4

B 1 3

However, I need missing groups to show up as "0" in the dataset:

ID group group_amount

A 1 6

A 2 4

B 1 3

B 2 0

What is the most elegant way to do this?

My current idea is to loop through all IDs and groups adding rows with amount = 0 before the egen command. This seems ugly and probably slower than it needs to be. I could also check whether a row exists before adding the new one, but I presume adding rows is not that expensive of an operation, and it would only make the code uglier. Is there a better way?

Thanks in advance!
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35760

16 Feb 2023, 18:42

Not quite what you're asking for, but a job candidate:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 id byte(group amount)
"A" 1 2
"A" 2 3
"A" 1 4
"A" 2 1
"B" 1 2
"B" 1 1
end

. table id group, stat(total amount)

------------------------
        |      group    
        |  1   2   Total
--------+---------------
ID      |              
  A     |  6   4      10
  B     |  3           3
  Total |  9   4      13
------------------------

Also,

Code:

 
preserve 
collapse (sum) amount, by(id group)
fillin id group
replace amount = 0 if amount == .
table id group, stat(total amount)

------------------------
        |      group    
        |  1   2   Total
--------+---------------
ID      |               
  A     |  6   4      10
  B     |  3   0       3
  Total |  9   4      13
------------------------

restore

Last edited by Nick Cox; 16 Feb 2023, 18:56.

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

16 Feb 2023, 19:43

I know I can do the same with collapse, but there are other variables I'd like to keep

For those who find this discussion at a later date, using collapse you can keep single observations of variables by using stats like (first) as this example demonstrates.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1 id byte(group amount) float other str5 name
"A" 1 2  42 "Alice"
"A" 2 3 666 "Bob"  
"A" 1 4  42 "Alice"
"A" 2 1 666 "Bob"  
"B" 1 2  23 "Fred" 
"B" 1 1  23 "Fred" 
end

collapse (sum) group_amount=amount (first) other  name, by(id group)
list, clean noobs abbreviate(16)

Code:

. list, clean noobs abbreviate(16)

    id   group   group_amount   other    name  
     A       1              6      42   Alice  
     A       2              4     666     Bob  
     B       1              3      23    Fred  
     B       2              0       .    NONE

Comment

Clara Daru

Join Date: Nov 2020

Posts: 6
#4

16 Feb 2023, 19:48

Nick Cox fillin does precisely what I was looking for!
William Lisowski I'm taking note of the first option for future use.

Thanks a lot!
Comment

Announcement

How to ensure each group has at least one row?

Comment

Comment

Comment