Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sum the values of a column using bysort and two conditions

    Hello,

    I have this dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte j str1 actor byte(column1 column2)
    1 "A" 1  0
    2 "B" 0  0
    1 "B" 3  0
    2 "A" 2  1
    1 "C" 4  2
    2 "D" 5  3
    1 "A" 6  4
    2 "B" 1  2
    1 "B" 2  0
    2 "A" 3  3
    1 "C" 4  4
    2 "D" 5  1
    1 "D" 6  3
    2 "B" 7  0
    1 "A" 2 10
    2 "C" 3 11
    1 "E" 4  0
    2 "D" 5  0
    1 "A" 0  0
    2 "B" 0  1
    1 "A" 0  2
    2 "B" 1  5
    1 "E" 2  6
    2 "A" 3  2
    1 "B" 1  9
    2 "E" 0  1
    1 "D" 0  1
    2 "C" 0  1
    end
    I would like to create a new column, let's called it "wanted", which is the sum of column 1 and 2 by actor. However, I would like to count the sum of column1, just if j=2, and the sum of column 2, just if j = 1. In other words, I would like to sum the values of columns 1 and 2 by actor, depending on column j.

    I am trying something like this:

    Code:
    bysort actor: egen wanted = total(column1) if j = 2 or total(column2) if j = 1
    But it is not working at all. Any suggestion is more than welcome.
    Last edited by Diego Malo; 05 May 2023, 02:58.

  • #2
    "something like this" as you cite is not legal and some way from being legal.

    if j = 1 should be if j == 1

    You can't have two if qualifiers in any command.

    You can't have more than one egen function call in any command

    or has no meaning to Stata in your statement.

    Naturally, you may not have typed literally what you give here, but if so, there is no way to comment on it.

    Does this help? See also https://journals.sagepub.com/doi/pdf...867X1101100210, especially Section 9.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte j str1 actor byte(column1 column2)
    1 "A" 1  0
    2 "B" 0  0
    1 "B" 3  0
    2 "A" 2  1
    1 "C" 4  2
    2 "D" 5  3
    1 "A" 6  4
    2 "B" 1  2
    1 "B" 2  0
    2 "A" 3  3
    1 "C" 4  4
    2 "D" 5  1
    1 "D" 6  3
    2 "B" 7  0
    1 "A" 2 10
    2 "C" 3 11
    1 "E" 4  0
    2 "D" 5  0
    1 "A" 0  0
    2 "B" 0  1
    1 "A" 0  2
    2 "B" 1  5
    1 "E" 2  6
    2 "A" 3  2
    1 "B" 1  9
    2 "E" 0  1
    1 "D" 0  1
    2 "C" 0  1
    end
    
    bysort actor j : egen wanted = total(cond(j == 1, column1, column2))
    
    order actor 
    
    list , sepby(actor j)
    
         +----------------------------------------+
         | actor   j   column1   column2   wanted |
         |----------------------------------------|
      1. |     A   1         0         0        9 |
      2. |     A   1         6         4        9 |
      3. |     A   1         0         2        9 |
      4. |     A   1         1         0        9 |
      5. |     A   1         2        10        9 |
         |----------------------------------------|
      6. |     A   2         3         3        6 |
      7. |     A   2         2         1        6 |
      8. |     A   2         3         2        6 |
         |----------------------------------------|
      9. |     B   1         3         0        6 |
     10. |     B   1         2         0        6 |
     11. |     B   1         1         9        6 |
         |----------------------------------------|
     12. |     B   2         7         0        8 |
     13. |     B   2         0         0        8 |
     14. |     B   2         1         2        8 |
     15. |     B   2         1         5        8 |
     16. |     B   2         0         1        8 |
         |----------------------------------------|
     17. |     C   1         4         4        8 |
     18. |     C   1         4         2        8 |
         |----------------------------------------|
     19. |     C   2         3        11       12 |
     20. |     C   2         0         1       12 |
         |----------------------------------------|
     21. |     D   1         0         1        6 |
     22. |     D   1         6         3        6 |
         |----------------------------------------|
     23. |     D   2         5         3        4 |
     24. |     D   2         5         0        4 |
     25. |     D   2         5         1        4 |
         |----------------------------------------|
     26. |     E   1         4         0        6 |
     27. |     E   1         2         6        6 |
         |----------------------------------------|
     28. |     E   2         0         1        1 |
         +----------------------------------------+

    Comment


    • #3
      Thank you for your answer Nick Cox

      Absolutely, my code was not working. Stata said "=exp not allowed", but I did not know why, now I know it. I was coding between "Stata" and "Python". Sorry for that.

      Thanks also for your solution. Maybe I was not completely explicit but, my goal is to have in column "wanted" the sum of both, column1, and column2, but sum just the values of column1 when j = 2, and just the values of column2 when j = 1, this is why I wrote wrongly the "or" in my code. I do not want a differentiation of the results by column j, I just use column j to know which column I have to sum in each row.

      However, I think that from the results of your code it is easy to do that with another line of code.


      Last edited by Diego Malo; 05 May 2023, 03:48.

      Comment


      • #4
        Indeed;

        Code:
         
         bysort actor j : egen wanted = total(cond(j == 2, column1, column2))
        corresponds more closely to what you asked for. But I think you need that by: prefix.

        Comment

        Working...
        X