Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to ensure each group has at least one row?

    Hi everyone,

    I have a dataset that looks like this:

    ID group amount
    A 1 2
    A 2 3
    A 1 4
    A 2 1
    B 1 2
    B 1 1

    I'm getting the total amount for each ID and group:
    Code:
    egen group_amount = total(amount), by(ID group)
    And then keeping only one row:
    Code:
    bysort ID group: keep if _n == 1
    (I know I can do the same with collapse, but there are other variables I'd like to keep)

    As you can see, B does not have any amount in group 2, so the result will be

    ID group group_amount
    A 1 6
    A 2 4
    B 1 3

    However, I need missing groups to show up as "0" in the dataset:

    ID group group_amount
    A 1 6
    A 2 4
    B 1 3
    B 2 0

    What is the most elegant way to do this?

    My current idea is to loop through all IDs and groups adding rows with amount = 0 before the egen command. This seems ugly and probably slower than it needs to be. I could also check whether a row exists before adding the new one, but I presume adding rows is not that expensive of an operation, and it would only make the code uglier. Is there a better way?

    Thanks in advance!

  • #2
    Not quite what you're asking for, but a job candidate:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1 id byte(group amount)
    "A" 1 2
    "A" 2 3
    "A" 1 4
    "A" 2 1
    "B" 1 2
    "B" 1 1
    end
    
    . table id group, stat(total amount)
    
    ------------------------
            |      group    
            |  1   2   Total
    --------+---------------
    ID      |              
      A     |  6   4      10
      B     |  3           3
      Total |  9   4      13
    ------------------------
    Also,

    Code:
     
    preserve 
    collapse (sum) amount, by(id group)
    fillin id group
    replace amount = 0 if amount == .
    table id group, stat(total amount)
    
    ------------------------
            |      group    
            |  1   2   Total
    --------+---------------
    ID      |               
      A     |  6   4      10
      B     |  3   0       3
      Total |  9   4      13
    ------------------------
    
    restore
    .
    Last edited by Nick Cox; 16 Feb 2023, 18:56.

    Comment


    • #3
      I know I can do the same with collapse, but there are other variables I'd like to keep
      For those who find this discussion at a later date, using collapse you can keep single observations of variables by using stats like (first) as this example demonstrates.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str1 id byte(group amount) float other str5 name
      "A" 1 2  42 "Alice"
      "A" 2 3 666 "Bob"  
      "A" 1 4  42 "Alice"
      "A" 2 1 666 "Bob"  
      "B" 1 2  23 "Fred" 
      "B" 1 1  23 "Fred" 
      end
      
      collapse (sum) group_amount=amount (first) other  name, by(id group)
      list, clean noobs abbreviate(16)
      Code:
      . list, clean noobs abbreviate(16)
      
          id   group   group_amount   other    name  
           A       1              6      42   Alice  
           A       2              4     666     Bob  
           B       1              3      23    Fred  
           B       2              0       .    NONE

      Comment


      • #4
        Nick Cox fillin does precisely what I was looking for!
        William Lisowski I'm taking note of the first option for future use.

        Thanks a lot!

        Comment

        Working...
        X